Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joel Jacobson" <joel(at)compiler(dot)org>
Cc: "Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date: 2026-03-03 15:31:06
Message-ID: 1010506.1772551866@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
>> Aside: you could argue that failing to consider stanullfrac is wrong,
>> and maybe it is. But the more I looked at this code the more
>> convinced I got that it was only partially accounting for nulls
>> anyway. That seems like perhaps something to look into later.

> How about adjusting estfract for the null fraction before clamping?

This reminds me of the unfinished business at [1]. We really ought
to make it true that nulls never get into the hash table before
we assume that's so in costing. One of the things I was thinking
was being overlooked is the possibility of lots of nulls bloating
whichever hash bucket they get put in --- but if they aren't put
into a bucket then it's not wrong to ignore them here.

(Strictly speaking, that's still not so with non-strict hash operators,
but those are so rare that I don't mind not accounting for them.)

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3061845(dot)1746486714(at)sss(dot)pgh(dot)pa(dot)us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2026-03-03 15:39:27 Re: Refactor recovery conflict signaling a little
Previous Message Tom Lane 2026-03-03 15:22:03 Re: Fix bug in multixact Oldest*MXactId initialization and access