Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date: 2026-02-25 20:31:57
Message-ID: bdd5fefb-e71a-4aef-93d9-68179bb33d08@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 25, 2026, at 13:19, Joel Jacobson wrote:
> On Tue, Feb 24, 2026, at 19:21, Joel Jacobson wrote:
>> This bug seems to sometimes cause the wrong table, the larger table, to
>> be hashed in a Hash Join, and the smaller table to be used for probing.

To help reviewers, instead of relying on benchmark results,
I realized it's much better if we can actually prove the calculation
of skew_ratio is incorrect, and that it becomes correct with the fix.

I therefore added debugging here:

if (avgfreq > 0.0 && *mcv_freq > avgfreq)
+ {
+ elog(DEBUG1, "mcv_freq=%g, avgfreq=%g, skew_ratio=%g",
+ *mcv_freq, avgfreq, *mcv_freq / avgfreq);
estfract *= *mcv_freq / avgfreq;
+ }

And created this minimal schema to prove the incorrect calculation:

CREATE TABLE orders (
order_id bigint NOT NULL,
tracking_code bigint,
region int NOT NULL
);

CREATE TABLE shipments (
shipment_id bigint NOT NULL,
tracking_code bigint NOT NULL
);

-- Skewed tracking_code distribution:
-- 10% of rows share a single "hot" tracking_code (value 1),
-- the rest get unique codes.
-- This ensures mcv_freq > avgfreq, triggering the skew adjustment
-- even in the fixed version of estimate_hash_bucket_stats.
INSERT INTO orders
SELECT
g,
CASE
WHEN g > 150000 THEN NULL
WHEN g <= 15000 THEN 1
ELSE g
END,
(g % 1000) + 1
FROM generate_series(1, 200000) g;

INSERT INTO shipments
SELECT g, CASE WHEN g <= 15000 THEN 1 ELSE g END
FROM generate_series(1, 150000) g;

CREATE INDEX ON orders (region);

ANALYZE orders;
ANALYZE shipments;

-- Ground truth of the skew_ratio for orders WHERE region = 42:
WITH base AS (
SELECT
count(*)::numeric AS total,
count(tracking_code)::numeric AS nonnull,
count(DISTINCT tracking_code) AS ndistinct
FROM orders
WHERE region = 42
),
mcv AS (
SELECT count(*)::numeric AS mcv_count
FROM orders
WHERE region = 42 AND tracking_code IS NOT NULL
GROUP BY tracking_code
ORDER BY count(*) DESC
LIMIT 1
)
SELECT
round(mcv_count / total, 6) AS mcv_freq,
round(nonnull / total / ndistinct, 6) AS avgfreq,
round((mcv_count / total) / (nonnull / total / ndistinct), 6) AS skew_ratio
FROM base, mcv;

mcv_freq | avgfreq | skew_ratio
----------+----------+------------
0.075000 | 0.005515 | 13.600000
(1 row)

SET client_min_messages = DEBUG1;

SELECT *
FROM orders o
JOIN shipments s ON s.tracking_code = o.tracking_code
WHERE o.region = 42;

-- HEAD (65707ed):
DEBUG: mcv_freq=0.0748333, avgfreq=8.69725e-06, skew_ratio=8604.25

-- 0001-Fix-estimate_hash_bucket_stats-to-use-filtered-ndist.patch:
DEBUG: mcv_freq=0.0738667, avgfreq=0.00871124, skew_ratio=8.47947

In HEAD, the skew_ratio for orders.tracking_code is wrongly estimated
to 8604.25, when the ground truth is 13.6, which the fixed estimate
8.47947, is a good approximation of.

The fix only moves the computation of avgfreq from before the
ndistinct adjustment, to after it:

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 29fec65559..d19c4b5d96 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4456,9 +4456,6 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node *hashkey, double nbuckets,
else
stanullfrac = 0.0;

- /* Compute avg freq of all distinct data values in raw relation */
- avgfreq = (1.0 - stanullfrac) / ndistinct;
-
/*
* Adjust ndistinct to account for restriction clauses. Observe we are
* assuming that the data distribution is affected uniformly by the
@@ -4473,6 +4470,9 @@ estimate_hash_bucket_stats(PlannerInfo *root, Node *hashkey, double nbuckets,
ndistinct = clamp_row_est(ndistinct);
}

+ /* Compute avg freq of all distinct data values in the filtered relation */
+ avgfreq = (1.0 - stanullfrac) / ndistinct;
+
/*
* Initial estimate of bucketsize fraction is 1/nbuckets as long as the
* number of buckets is less than the expected number of distinct values;

/Joel

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2026-02-25 21:03:01 Re: Expanding HOT updates for expression and partial indexes
Previous Message Andres Freund 2026-02-25 20:29:01 Re: More speedups for tuple deformation