Improve hash join's handling of tuples with null join keys

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Improve hash join's handling of tuples with null join keys
Date: 2025-05-05 23:11:54
Message-ID: 3061845.1746486714@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The attached patch is a response to the discussion at [1], where
it emerged that lots of rows with null join keys can send a hash
join into too-many-batches hell, if they are on the outer side
of the join so that they must be null-extended not just discarded.
This isn't really surprising given that such rows will certainly
end up in the same hash bucket, and no amount of splitting can
reduce the size of that bucket. (I'm a bit surprised that the
growEnabled heuristic didn't kick in, but it seems it didn't,
at least not up to several million batches.)

Thinking about that, it occurred to me to wonder why we are putting
null-keyed tuples into the hash table at all. They cannot match
anything, so all we really have to do with them is emit one
null-extended copy. Awhile later I had the attached, which shoves
such rows into a tuplestore that's separate from the hash table
proper, ensuring that they can't bollix our algorithms for when to
grow the hash table. (For tuples coming from the right input, we
need to use a tuplestore in case we're asked to rescan the existing
hashtable. For tuples coming from the left input, we could
theoretically emit 'em and forget 'em immediately, but that'd require
some major code restructuring so I decided to just use a tuplestore
there too.)

This passes check-world, and I've extended a couple of existing test
cases to ensure that the new code paths are exercised. I've not done
any real performance testing, though.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/18909-e5e1b702c9441b8a%40postgresql.org

Attachment Content-Type Size
v1-0001-Improve-hash-join-s-handling-of-tuples-with-null-.patch text/x-diff 34.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2025-05-05 23:56:27 Re: POC: Parallel processing of indexes in autovacuum
Previous Message Jacob Champion 2025-05-05 23:06:34 [PATCH] Fix hostaddr crash during non-blocking cancellation