Re: Fix overflow of nbatch

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Vaibhav Jain <jainva(at)google(dot)com>, pgsql-hackers(at)postgresql(dot)org, Madhukar <madhukarprasad(at)google(dot)com>, Sangeetha Seshadri <sangsesh(at)google(dot)com>
Subject: Re: Fix overflow of nbatch
Date: 2025-10-09 14:16:21
Message-ID: CAAKRu_aTSfccyc_nK6ik0K4rBROzXaq74qneZZp3xmmQSDMQMg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 8, 2025 at 4:46 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> On 10/8/25 19:37, Melanie Plageman wrote:
>
> > Yes, this was wrong. I forgot an extra / sizeof(HashJoinTuple). I meant:
> >
> > hash_table_bytes / sizeof(HashJoinTuple) > MaxAllocSize /
> > sizeof(HashJoinTuple) / 2
>
> But the hash table is not allocated as a single chunk of memory, so I
> think MaxAllocSize would be the wrong thing to use here, no? Hash table
> is a collection of tuples (allocated one by one), that's why
> get_hash_memory_limit() uses SIZE_MAX. MaxAllocSize matters for
> nbuckets, because that indeed is allocated as a contiguous array, ofc.
>
> Also, why bother with /sizeof(HashJoinTuple) on both sides? Without it
> we get
>
> hash_table_bytes > MaxAllocSize / 2
>
> but again, that doesn't make much sense - the hash table can be larger,
> it's not a single palloc.

It came from the earlier clamping of nbuckets:

max_pointers = hash_table_bytes / sizeof(HashJoinTuple);
max_pointers = Min(max_pointers, MaxAllocSize / sizeof(HashJoinTuple));
dbuckets = Min(dbuckets, max_pointers);

I don't really get why this divides hash_table_bytes by
sizeof(HashJoinTuple) since, as you say, hash_table_bytes is supposed
to accommodate both the bucket_bytes and inner_rel_bytes.

> And I think it should be just
>
> if (hash_table_bytes > SIZE_MAX / 2)
> break;
>
> as a protection against hash_table_bytes overflowing SIZE_MAX (which on
> 64-bits seems implausible, but could happen on 32-bit builds).

That's roughly the check I ended up with -- except I used
space_allowed because it will be larger than hash_table_bytes if there
is a skew hashtable. I've updated the comment and such in attached v3.

- Melanie

Attachment Content-Type Size
v3-0001-Fix-overflow-risk-in-hashtable-size-calculation.patch text/x-patch 7.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Burd 2025-10-09 14:21:23 Fix for compiler warning triggered in WinGetFuncArgInPartition()
Previous Message Chao Li 2025-10-09 14:06:22 Re: Add downloaded files to gitignore