Quick Links

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From:	"Joel Jacobson" <joel(at)compiler(dot)org>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date:	2026-03-03 15:05:29
Message-ID:	5ddabbff-d0cb-4891-85fc-142f6d39d641@app.fastmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
> Aside: you could argue that failing to consider stanullfrac is wrong,
> and maybe it is. But the more I looked at this code the more
> convinced I got that it was only partially accounting for nulls
> anyway. That seems like perhaps something to look into later.

How about adjusting estfract for the null fraction before clamping?

```diff
@@ -4461,20 +4473,27 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node *hashkey, double nbuckets,
/*
* Initial estimate of bucketsize fraction is 1/nbuckets as long as the
* number of buckets is less than the expected number of distinct values;
* otherwise it is 1/ndistinct.
*/
if (ndistinct > nbuckets)
estfract = 1.0 / nbuckets;
else
estfract = 1.0 / ndistinct;

+ /*
+ * Adjust for null fraction. NULL keys are not inserted into the hash
+ * table, but inner_path_rows in final_cost_hashjoin includes them, so we
+ * must discount estfract to compensate.
+ */
+ estfract *= (1.0 - stanullfrac);
+
/*
* Clamp the bucketsize fraction to be not less than the MCV frequency,
* since whichever bucket the MCV values end up in will have at least that
* size. This has no effect if *mcv_freq is still zero.
*/
estfract = Max(estfract, *mcv_freq);

*bucketsize_frac = (Selectivity) estfract;

ReleaseVariableStats(vardata);
```

Here is an extreme example where 98% of all hj_nullfrac_inner.val are NULL:

CREATE TABLE hj_nullfrac_inner (id int, val int);
INSERT INTO hj_nullfrac_inner
SELECT g, CASE WHEN g <= 1000 THEN g ELSE NULL END
FROM generate_series(1, 50000) g;
CREATE INDEX ON hj_nullfrac_inner(val);

CREATE TABLE hj_nullfrac_outer (id int, val int);
INSERT INTO hj_nullfrac_outer
SELECT g, ((g-1) % 1000)+1
FROM generate_series(1, 500000) g;

ANALYZE hj_nullfrac_inner;
ANALYZE hj_nullfrac_outer;

EXPLAIN ANALYZE
SELECT * FROM hj_nullfrac_outer o JOIN hj_nullfrac_inner i ON o.val = i.val;

Both master (HEAD) and v2 results in the same plan:

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.30..20022.95 rows=499834 width=16) (actual time=0.016..289.314 rows=500000.00 loops=1)
Buffers: shared hit=5212 read=1
-> Seq Scan on hj_nullfrac_outer o (cost=0.00..7213.00 rows=500000 width=8) (actual time=0.008..31.449 rows=500000.00 loops=1)
Buffers: shared hit=2213
-> Memoize (cost=0.30..0.32 rows=1 width=8) (actual time=0.000..0.000 rows=1.00 loops=500000)
Cache Key: o.val
Cache Mode: logical
Estimates: capacity=1000 distinct keys=1000 lookups=500000 hit percent=99.80%
Hits: 499000 Misses: 1000 Evictions: 0 Overflows: 0 Memory Usage: 106kB
Buffers: shared hit=2999 read=1
-> Index Scan using hj_nullfrac_inner_val_idx on hj_nullfrac_inner i (cost=0.29..0.31 rows=1 width=8) (actual time=0.002..0.002 rows=1.00 loops=1000)
Index Cond: (val = o.val)
Index Searches: 1000
Buffers: shared hit=2999 read=1
Planning:
Buffers: shared hit=87 read=7
Planning Time: 0.419 ms
Execution Time: 303.107 ms
(18 rows)

With v2+stanullfrac adjustment, we get a Hash Join, that is faster:

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=1347.00..15436.61 rows=500161 width=16) (actual time=6.054..153.595 rows=500000.00 loops=1)
Hash Cond: (o.val = i.val)
Buffers: shared hit=2435
-> Seq Scan on hj_nullfrac_outer o (cost=0.00..7213.00 rows=500000 width=8) (actual time=0.008..31.597 rows=500000.00 loops=1)
Buffers: shared hit=2213
-> Hash (cost=722.00..722.00 rows=50000 width=8) (actual time=6.041..6.042 rows=1000.00 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 552kB
Buffers: shared hit=222
-> Seq Scan on hj_nullfrac_inner i (cost=0.00..722.00 rows=50000 width=8) (actual time=0.004..3.016 rows=50000.00 loops=1)
Buffers: shared hit=222
Planning:
Buffers: shared hit=6
Planning Time: 0.130 ms
Execution Time: 167.903 ms
(14 rows)

Here is elog debugging comparing v2 vs v2+stanullfrac adjustment, for
the above example:

v2:
ndistinct=1003.000000
nbuckets=65536.000000
stanullfrac=0.979933
mcv_freq=0.000020
estfract before clamping=0.000997
estfract after clamping=0.000997

v2+stanullfrac adjustment:
ndistinct=972.000000
nbuckets=65536.000000
stanullfrac=0.980567
mcv_freq=0.000020
estfract before stanullfrac adjustment=0.001029
estfract after stanullfrac adjustment=0.000020
estfract after clamping=0.000020

/Joel

Attachment	Content-Type	Size
fix-stanullfrac.patch	application/octet-stream	1.5 KB

In response to

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq at 2026-03-01 21:12:22 from Tom Lane

Responses

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq at 2026-03-03 15:31:06 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2026-03-03 15:11:05	Re: Fix bug in multixact Oldest*MXactId initialization and access
Previous Message	Anthonin Bonnefoy	2026-03-03 14:13:37	Re: Don't keep closed WAL segment in page cache after replay