Re: Improve hash join's handling of tuples with null join keys

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Chao Li" <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Improve hash join's handling of tuples with null join keys
Date: 2026-03-04 07:04:38
Message-ID: 3ccec807-d2ff-4150-9585-74cc52926646@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 3, 2026, at 21:58, Tom Lane wrote:
> I wrote:
>> Bug #19030 [1] seems to be a fresh report of the problem this patch
>> aims to solve. While answering that, I realized that the v2 patch
>> causes null-keyed inner rows to not be included in EXPLAIN ANALYZE's
>> report of the number of rows output by the Hash node. Now on the
>> one hand, what it's reporting is an accurate reflection of the
>> number of rows in the hash table, which perhaps is useful. On the
>> other hand, it's almost surely going to confuse users, and it's
>> different from the number we produced before. Should we try to
>> preserve the old behavior here? (I've not looked at what code
>> changes would be needed for that.)
>
> I got around to looking at that finally. It's not terribly difficult
> to fix, but while figuring out which counters were used for what,
> I noticed a pre-existing bug: when ExecHashRemoveNextSkewBucket moves
> tuples into the main hash table from the skew hash table, it fails to
> adjust hashtable->skewTuples, meaning that subsequent executions of
> ExecHashTableInsert will have the wrong idea of how many tuples are in
> the main table. The error is probably not very large because the
> skew table is not supposed to be big relative to the main table,
> but still, it's wrong. So I tried to clean that up here.
>
> 0001 attached is the same patch as before (brought up to HEAD, but
> only line numbers change). 0002 is the new code to fix these
> tuple-counting issues.
>
> regards, tom lane

I've tested v3-0001 and v3-0002 and can confirm the bug introduced
in v3-0001 is fixed in v3-0002:

% cat explain-analyze-problem.sql
CREATE TABLE ea_hash (id int);
INSERT INTO ea_hash SELECT g FROM generate_series(1, 10) g;
INSERT INTO ea_hash SELECT NULL FROM generate_series(1, 90);
ANALYZE ea_hash;

CREATE TABLE ea_probe (id int);
INSERT INTO ea_probe SELECT (g % 10) + 1 FROM generate_series(1, 10000) g;
ANALYZE ea_probe;

SET enable_nestloop = off;
SET enable_mergejoin = off;

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM ea_probe FULL OUTER JOIN ea_hash ON ea_probe.id = ea_hash.id;

EXPLAIN (COSTS OFF, ANALYZE, TIMING OFF, BUFFERS OFF, SUMMARY OFF)
SELECT count(*) FROM ea_probe RIGHT OUTER JOIN ea_hash ON ea_probe.id = ea_hash.id;

% git diff --no-index master.out v3-0001.out
diff --git a/master.out b/v3-0001.out
index 1e05e7e39a6..54210c49757 100644
--- a/master.out
+++ b/v3-0001.out
@@ -17,8 +17,8 @@ SET
-> Hash Full Join (actual rows=10090.00 loops=1)
Hash Cond: (ea_probe.id = ea_hash.id)
-> Seq Scan on ea_probe (actual rows=10000.00 loops=1)
- -> Hash (actual rows=100.00 loops=1)
- Buckets: 1024 Batches: 1 Memory Usage: 12kB
+ -> Hash (actual rows=10.00 loops=1)
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on ea_hash (actual rows=100.00 loops=1)
(7 rows)

@@ -29,8 +29,8 @@ SET
-> Hash Right Join (actual rows=10090.00 loops=1)
Hash Cond: (ea_probe.id = ea_hash.id)
-> Seq Scan on ea_probe (actual rows=10000.00 loops=1)
- -> Hash (actual rows=100.00 loops=1)
- Buckets: 1024 Batches: 1 Memory Usage: 12kB
+ -> Hash (actual rows=10.00 loops=1)
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on ea_hash (actual rows=100.00 loops=1)
(7 rows)

% git diff --no-index v3-0001.out v3-0002.out
diff --git a/v3-0001.out b/v3-0002.out
index 54210c49757..17dfe335b0b 100644
--- a/v3-0001.out
+++ b/v3-0002.out
@@ -17,7 +17,7 @@ SET
-> Hash Full Join (actual rows=10090.00 loops=1)
Hash Cond: (ea_probe.id = ea_hash.id)
-> Seq Scan on ea_probe (actual rows=10000.00 loops=1)
- -> Hash (actual rows=10.00 loops=1)
+ -> Hash (actual rows=100.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on ea_hash (actual rows=100.00 loops=1)
(7 rows)
@@ -29,7 +29,7 @@ SET
-> Hash Right Join (actual rows=10090.00 loops=1)
Hash Cond: (ea_probe.id = ea_hash.id)
-> Seq Scan on ea_probe (actual rows=10000.00 loops=1)
- -> Hash (actual rows=10.00 loops=1)
+ -> Hash (actual rows=100.00 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on ea_hash (actual rows=100.00 loops=1)
(7 rows)

% git diff --no-index master.out v3-0002.out
diff --git a/master.out b/v3-0002.out
index 1e05e7e39a6..17dfe335b0b 100644
--- a/master.out
+++ b/v3-0002.out
@@ -18,7 +18,7 @@ SET
Hash Cond: (ea_probe.id = ea_hash.id)
-> Seq Scan on ea_probe (actual rows=10000.00 loops=1)
-> Hash (actual rows=100.00 loops=1)
- Buckets: 1024 Batches: 1 Memory Usage: 12kB
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on ea_hash (actual rows=100.00 loops=1)
(7 rows)

@@ -30,7 +30,7 @@ SET
Hash Cond: (ea_probe.id = ea_hash.id)
-> Seq Scan on ea_probe (actual rows=10000.00 loops=1)
-> Hash (actual rows=100.00 loops=1)
- Buckets: 1024 Batches: 1 Memory Usage: 12kB
+ Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on ea_hash (actual rows=100.00 loops=1)
(7 rows)

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2026-03-04 07:07:51 RE: BUG: Former primary node might stuck when started as a standby
Previous Message Daniil Davydov 2026-03-04 06:58:49 Re: POC: Parallel processing of indexes in autovacuum