Re: Fix overflow of nbatch

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Rowley <dgrowleyml(at)gmail(dot)com>, Vaibhav Jain <jainva(at)google(dot)com>, pgsql-hackers(at)postgresql(dot)org, Madhukar <madhukarprasad(at)google(dot)com>, Sangeetha Seshadri <sangsesh(at)google(dot)com>
Subject: Re: Fix overflow of nbatch
Date: 2025-10-07 21:10:02
Message-ID: CAAKRu_Zf8i-nKct0emXE0wqcUnjRtpB_dm3_2AeMBzaBvHBe0Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 24, 2025 at 7:02 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>
> Here's a couple draft patches fixing the bug:
>
> - 0001 adds the missing size_t cast, to fix the overflow
> - 0002 fixes the balancing, by adjusting the hash table size limit
> - 0003 adds the missing overflow protection for nbuckets and the hash
> table limit

I've attached an alternative patch idea. It's shorter because I
avoided multiplication using algebraic equivalence. There are two
overflow checks for nbuckets and that max_pointers will be able to be
allocated, but otherwise, it avoids most of the overflow risk by
switching to division.

On the topic of max_pointers, I think there is no need to handle >
MaxAllocSize. We were willing to cope with the original values of
nbatch and nbuckets, we are just trying to optimize that. If doing so
would make us run out of memory for the arrays of poitners to buckets,
just use the last hashtable size that didn't have this problem. That
means we don't have to do any shenanigans to have powers of two for
nbuckets because we don't do clamping.

Also, I think the outer loop needs the condition nbatch > 1 not nbatch
> 0 -- when nbatch is 1, once we divide it by 2, it would end up as 0.

I'm still waiting for the 4billion row table to be created on my
machine, so I haven't verified that I get the same results as you yet.

> - 0004 rewords the comment explaining how the balancing works. Reading
> it after a couple months, I found it overly verbose / not very clear.
> I'm sure it could be improved even further.

My attached patch does build on your revised wording.

- Melanie

Attachment Content-Type Size
alt-Fix-overflow-risk-in-hashtable-size-calculation.patch text/x-patch 7.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matheus Alcantara 2025-10-07 21:14:14 Re: Optimize LISTEN/NOTIFY
Previous Message Peter Geoghegan 2025-10-07 21:08:36 Re: Should we update the random_page_cost default value?