Re: Should HashSetOp go away

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Should HashSetOp go away
Date: 2025-10-26 18:00:17
Message-ID: 2156464.1761501617@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> I noticed some changes in this code v18, so wanted to revisit the issue.
> Under commit 27627929528e, it looks like it got 25% more memory efficient,
> but it thinks it got 40% more efficient, so the memory use got better but
> the estimation actually got worse.

Hmm, so why not fix that estimation?

> I was thinking of ways to improve the memory usage (or at least its
> estimation) but decided maybe it would be better if HashSetOp went away
> entirely. As far as I can tell HashSetOp has nothing to recommend it other
> than the fact that it already exists. If we instead used an elaboration on
> Hash Anti Join, then it would automatically get spilling to disk, parallel
> operations, better estimation, and the benefits of whatever micro
> optimizations people lavish on the highly used HashJoin machinery but not
> the obscure, little-used HashSetOp.

This seems like a pretty bad solution. It would imply exporting the
complexities of duplicate-counting for EXCEPT ALL and INTERSECT ALL
modes into the hash-join logic. We don't need that extra complexity
there (it's more than enough of a mess already), and we don't need
whatever performance hit ordinary hash joins would take.

Also, I doubt the problem is confined to nodeSetOp. I think this is
fundamentally a complaint about BuildTupleHashTable and friends being
unable to spill to disk. Since we also use that logic for hashed
aggregates, RecursiveUnion, and hashed SubPlans, getting rid of
nodeSetOp isn't going to move the needle very far.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2025-10-26 19:43:01 Re: C11: should we use char32_t for unicode code points?
Previous Message Mahmoud Ayman 2025-10-26 17:49:07 Cannot log in to CommitFest due to cool-off period