Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From: "Joel Jacobson" <joel(at)compiler(dot)org>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date: 2026-03-03 17:33:31
Message-ID: 4a597843-0dc8-4fd0-a219-91d3d01490b5@app.fastmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
> "Joel Jacobson" <joel(at)compiler(dot)org> writes:
>> On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
>>> Aside: you could argue that failing to consider stanullfrac is wrong,
>>> and maybe it is. But the more I looked at this code the more
>>> convinced I got that it was only partially accounting for nulls
>>> anyway. That seems like perhaps something to look into later.
>
>> How about adjusting estfract for the null fraction before clamping?
>
> This reminds me of the unfinished business at [1]. We really ought
> to make it true that nulls never get into the hash table before
> we assume that's so in costing. One of the things I was thinking
> was being overlooked is the possibility of lots of nulls bloating
> whichever hash bucket they get put in --- but if they aren't put
> into a bucket then it's not wrong to ignore them here.
>
> (Strictly speaking, that's still not so with non-strict hash operators,
> but those are so rare that I don't mind not accounting for them.)
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/3061845(dot)1746486714(at)sss(dot)pgh(dot)pa(dot)us

Hmm, OK, so there are cases when we don't discard NULLs when we should
be able to? I was reading these lines in nodeHash.c and thought we would
always be discarding them when possible:

if (!isnull)
{
...
}
else if (node->keep_null_tuples)
{
/* null join key, but we must save tuple to be emitted later */
...
}
/* else we can discard the tuple immediately */

Thanks for the pointer to [1], I will dig into that thread, exciting!

/Joel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florin Irion 2026-03-03 17:33:40 UBSAN crash in EventTriggerCollectAlterTSConfig (memcpy with NULL src)
Previous Message David Geier 2026-03-03 17:31:51 Re: Reduce build times of pg_trgm GIN indexes