Quick Links

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From:	"Joel Jacobson" <joel(at)compiler(dot)org>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date:	2026-03-02 08:39:15
Message-ID:	08da4388-d20b-4183-8ff6-5d83fa3fbb56@app.fastmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
> What is more sensible I think is just to clamp estfract to be at least
> mcv_freq always, dispensing with a whole bunch of the complication
> here. We actually don't need avgfreq at all, and therefore not
> stanullfrac, and also the ad-hoc range clamp at the bottom no longer
> seems necessary. Interestingly, this also makes the logic more
> nearly like the "isdefault" early exit:
>
> *bucketsize_frac = (Selectivity) Max(0.1, *mcv_freq);
>
> which IIRC was added a lot later than the original logic.

Nice simplification.

> So I end with the attached draft patch. Very interestingly,
> neither this version nor your lets-calculate-avgfreq-later
> patch change any regression tests at all compared to git HEAD.

LGTM.

> Maybe we should try to add a case that does change.

Yes, I think that would be good.
Attached patch adds a new test that does change.

/Joel

Attachment	Content-Type	Size
estimate_hash_bucket_stats-test.patch	application/octet-stream	2.4 KB

In response to

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq at 2026-03-01 21:12:22 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jakub Wartak	2026-03-02 08:39:22	Re: 64-bit wait_event and introduction of 32-bit wait_event_arg
Previous Message	Hüseyin Demir	2026-03-02 08:14:51	Re: Don't keep closed WAL segment in page cache after replay