Re: Fix overflow of nbatch

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Vaibhav Jain <jainva(at)google(dot)com>, pgsql-hackers(at)postgresql(dot)org, Madhukar <madhukarprasad(at)google(dot)com>, Sangeetha Seshadri <sangsesh(at)google(dot)com>
Subject: Re: Fix overflow of nbatch
Date: 2025-10-08 21:08:45
Message-ID: 5e073665-a73b-42c3-a483-253ab7a6fd87@vondra.me
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/8/25 21:16, Melanie Plageman wrote:
> On Wed, Oct 8, 2025 at 1:37 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
>>
>> I have updated my patch to fix the mistakes above. I also noticed then
>> that I wasn't doubling space_allowed in the loop but instead setting
>> it to hash_table_bytes at the end. This doesn't produce a power of 2
>> because we subtract skew_mcvs from the hash_table_bytes. So, we have
>> to keep using space_allowed if we want a power of 2 in the end.
>>
>> I've changed my patch to do this, but this made me wonder if we want
>> to be doing this or instead take hash_table_bytes at the end and round
>> it up to a power of 2 and set space_allowed to that. If the skew
>> hashtable is large, we may be allocating way more space_allowed than
>> we need for new hash_table_bytes + skew hashtable buckets.
>

I don't think there's any promise hash_table_bytes being a power of 2.
You can make hash_table_bytes an almost arbitrary value by setting
work_mem and hash_mem_multiplier. Or am I missing something?

But you're right hash_table_bytes and space_allowed may not be equal if
useskew=true. So setting space_allowed to hash_table_bytes at the end
does not seem right. I think we don't actually need hash_table_bytes at
this point, we can just ignore it, and use/double *space_allowed.

I kept using hash_table_bytes mostly because it didn't require the
pointer dereferencing, but I failed to consider the useskew=true thing.

However, this means there's probably a bug - the loop should probably
double num_skew_mcvs too. We simply reserve SKEW_HASH_MEM_PERCENT of
space_allowed for skew hashtable, so should we adjust it the same way?

> Oh wait, that doesn't make sense because each batch could have a skew hashtable.
>

Not sure I understand. Is this the same issue I just described?

regards

--
Tomas Vondra

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2025-10-08 21:08:49 Re: Thoughts on a "global" client configuration?
Previous Message Andres Freund 2025-10-08 20:52:17 Re: Should we update the random_page_cost default value?