From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Vaibhav Jain <jainva(at)google(dot)com>, pgsql-hackers(at)postgresql(dot)org, Madhukar <madhukarprasad(at)google(dot)com>, Sangeetha Seshadri <sangsesh(at)google(dot)com> |
Subject: | Re: Fix overflow of nbatch |
Date: | 2025-10-09 14:43:52 |
Message-ID: | CAAKRu_aBt2kvQ5CY0Hq=Bqhdu348LzEAk-FCdM2O+yjL1W6LjQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 8, 2025 at 5:08 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> On 10/8/25 21:16, Melanie Plageman wrote:
> > On Wed, Oct 8, 2025 at 1:37 PM Melanie Plageman
> > <melanieplageman(at)gmail(dot)com> wrote:
> >>
> >> I have updated my patch to fix the mistakes above. I also noticed then
> >> that I wasn't doubling space_allowed in the loop but instead setting
> >> it to hash_table_bytes at the end. This doesn't produce a power of 2
> >> because we subtract skew_mcvs from the hash_table_bytes. So, we have
> >> to keep using space_allowed if we want a power of 2 in the end.
> >>
> >> I've changed my patch to do this, but this made me wonder if we want
> >> to be doing this or instead take hash_table_bytes at the end and round
> >> it up to a power of 2 and set space_allowed to that. If the skew
> >> hashtable is large, we may be allocating way more space_allowed than
> >> we need for new hash_table_bytes + skew hashtable buckets.
> >
>
> I don't think there's any promise hash_table_bytes being a power of 2.
> You can make hash_table_bytes an almost arbitrary value by setting
> work_mem and hash_mem_multiplier. Or am I missing something?
Right. Your example used powers of 2 for work_mem, which is why I saw
power of 2 output for space_allowed, but that is arbitrary.
> But you're right hash_table_bytes and space_allowed may not be equal if
> useskew=true. So setting space_allowed to hash_table_bytes at the end
> does not seem right. I think we don't actually need hash_table_bytes at
> this point, we can just ignore it, and use/double *space_allowed.
>
> I kept using hash_table_bytes mostly because it didn't require the
> pointer dereferencing, but I failed to consider the useskew=true thing.
We could make some local variable of space_allowed if we want.
> However, this means there's probably a bug - the loop should probably
> double num_skew_mcvs too. We simply reserve SKEW_HASH_MEM_PERCENT of
> space_allowed for skew hashtable, so should we adjust it the same way?
We do need to recalculate num_skew_mcvs once at the end before returning.
But I don't think we need to do it on each iteration of the loop. We
want the total size of what is in memory -- the regular and skew
hashtable -- to be the condition we break on. I think that's
space_allowed.
> > Oh wait, that doesn't make sense because each batch could have a skew hashtable.
>
> Not sure I understand. Is this the same issue I just described?
Yea, what I meant is that we are increasing the size of the hashtable
and decreasing the number of batches on the assumption that we might
end up with tuples from multiple batches consolidated down to one
batch and one larger hashtable. Each of those batches could have skew
buckets so we now end up needing a larger skew hashtable too, not just
a larger regular hashtable.
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-10-09 14:45:15 | Re: Fix for compiler warning triggered in WinGetFuncArgInPartition() |
Previous Message | Nathan Bossart | 2025-10-09 14:41:15 | Re: Fix for compiler warning triggered in WinGetFuncArgInPartition() |