Quick Links

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Joel Jacobson" <joel(at)compiler(dot)org>
Cc:	"Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq
Date:	2026-03-04 20:50:36
Message-ID:	1657589.1772657436@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Joel Jacobson" <joel(at)compiler(dot)org> writes:
> On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
>> This reminds me of the unfinished business at [1]. We really ought
>> to make it true that nulls never get into the hash table before
>> we assume that's so in costing.

> Hmm, OK, so there are cases when we don't discard NULLs when we should
> be able to? I was reading these lines in nodeHash.c and thought we would
> always be discarding them when possible:

> if (!isnull)
> {
> ...
> }
> else if (node->keep_null_tuples)
> {
> /* null join key, but we must save tuple to be emitted later */
> ...
> }
> /* else we can discard the tuple immediately */

I'm confused ... that keep_null_tuples bit appears nowhere in HEAD,
but it does appear in the patch at [1].

Anyway, the short answer is that we discard NULLs if possible, but
it's not possible when doing an outer join that requires returning
null-extended rows from the hashed side.

I've now pushed the patch we were discussing before, and all that's
left to worry about (AFAIK) in estimate_hash_bucket_stats is its
handling of null join keys. I'd prefer to get the other patch
in before worrying more about that.

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/3061845.1746486714%40sss.pgh.pa.us

In response to

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq at 2026-03-03 17:33:31 from Joel Jacobson

Responses

Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq at 2026-03-05 06:17:29 from Joel Jacobson

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2026-03-04 20:54:48	Re: Non-text mode for pg_dumpall
Previous Message	Zsolt Parragi	2026-03-04 20:40:34	Re: Improve OAuth discovery logging