| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | "Joel Jacobson" <joel(at)compiler(dot)org> |
| Cc: | "Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq |
| Date: | 2026-03-01 00:08:05 |
| Message-ID: | 53750.1772323685@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Fri, Feb 27, 2026, at 20:15, Tom Lane wrote:
>> Joel, do you want to run this to ground, and in particular
>> see if that way of fixing it passes your sanity tests?
> Challenge accepted!
Thanks!
> [...hours later...]
> My conclusion is that we still need to move avgfreq
> computation, like I suggested.
Hmm ... doesn't this contradict your argument that avgfreq and
mcv_freq need to be calculated on the same basis? Admittedly
that was just a heuristic, but I'm not seeing why it's wrong.
> The reason for this is that estfract is calculated as:
> estfract = 1.0 / ndistinct;
> where ndistinct has been adjusted to account for restriction clauses.
> Therefore, we must also use the adjusted avgfreq when adjusting
> estfract here:
It feels like that might end up double-counting the effects of
the restriction clauses.
Anyway, we all seem to agree that s/rel->rows/rel->tuples/ is the
correct fix for a newly-introduced bug. I'm inclined to proceed
by committing that fix (along with any regression test fallout)
and then investigating the avgfreq change as an independent matter.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Julien Rouhaud | 2026-03-01 00:10:27 | Re: Cleaning up PREPARE query strings? |
| Previous Message | Lukas Fittl | 2026-02-28 23:58:34 | pg_buffercache: Add per-relation summary stats |