Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: "Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date: 2026-03-01 00:08:05
Message-ID: 53750.1772323685@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Fri, Feb 27, 2026, at 20:15, Tom Lane wrote:
>> Joel, do you want to run this to ground, and in particular
>> see if that way of fixing it passes your sanity tests?

> Challenge accepted!

Thanks!

> [...hours later...]
> My conclusion is that we still need to move avgfreq
> computation, like I suggested.

Hmm ... doesn't this contradict your argument that avgfreq and
mcv_freq need to be calculated on the same basis? Admittedly
that was just a heuristic, but I'm not seeing why it's wrong.

> The reason for this is that estfract is calculated as:
> estfract = 1.0 / ndistinct;
> where ndistinct has been adjusted to account for restriction clauses.
> Therefore, we must also use the adjusted avgfreq when adjusting
> estfract here:

It feels like that might end up double-counting the effects of
the restriction clauses.

Anyway, we all seem to agree that s/rel->rows/rel->tuples/ is the
correct fix for a newly-introduced bug. I'm inclined to proceed
by committing that fix (along with any regression test fallout)
and then investigating the avgfreq change as an independent matter.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2026-03-01 00:10:27 Re: Cleaning up PREPARE query strings?
Previous Message Lukas Fittl 2026-02-28 23:58:34 pg_buffercache: Add per-relation summary stats