Re: Should HashSetOp go away

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should HashSetOp go away
Date: 2025-10-26 20:16:20
Message-ID: 2219168.1761509780@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
>> I noticed some changes in this code v18, so wanted to revisit the issue.
>> Under commit 27627929528e, it looks like it got 25% more memory efficient,
>> but it thinks it got 40% more efficient, so the memory use got better but
>> the estimation actually got worse.

> Hmm, so why not fix that estimation?

So I poked at this a little bit, and found a few factors affecting it:

* Tuple hash tables are typically given license to use twice work_mem;
see hash_mem_multiplier. Not sure if you were accounting for that
in your test.

* create_setop_path's required-space estimate of entrysize * numGroups
was rather lame before commit 5dfc19814, and it's even more so
afterwards. It's basically only accounting for the tuples themselves,
and not either the hashtable overhead or the SetOpStatePerGroupData
counter space. With wide tuples that might disappear into the noise,
but with only 16-ish data bytes per tuple it's all about the overhead.
On my machine this example uses 80 bytes per tuple in the "SetOp hash
table" context and another 16 or more in the simplehash hashtable.
So about triple what the planner thought.

* We can buy some of this back nearly for free, by switching the
SetOp hash table context to be a BumpContext not an AllocSetContext.
That doesn't move the needle too much in this example with
MEMORY_CONTEXT_CHECKING on, but with it off, the SetOp hash table
consumption drops to 48 bytes/tuple. (Also, for other tuple widths,
AllocSetContext's round-up-to-power-of-2 behavior could hurt a lot
more than it does here.)

* To do better, we probably need to take this computation out of the
planner and have execGrouping.c expose a function to estimate the
TupleHashTable size for N entries and such-and-such average data
width. That in turn will require simplehash.h to expose a function
for estimating the size of its tables, because AFAICS its callers are
not supposed to know such details.

Attached is a quick-hack patch to use a BumpContext for this purpose
in nodeSetOp.c. We should probably look at whether other users of
BuildTupleHashTable can do similarly. I haven't thought hard about
what better-factorized space estimation would look like.

regards, tom lane

Attachment Content-Type Size
use-bump-context-in-nodeSetOp.patch text/x-diff 998 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-10-26 20:55:11 Use BumpContext contexts for TupleHashTables' tablecxt
Previous Message Jeff Davis 2025-10-26 19:43:01 Re: C11: should we use char32_t for unicode code points?