| From: | "Joel Jacobson" <joel(at)compiler(dot)org> |
|---|---|
| To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | "Tender Wang" <tndrwang(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: [BUG?] estimate_hash_bucket_stats uses wrong ndistinct for avgfreq |
| Date: | 2026-03-03 17:33:31 |
| Message-ID: | 4a597843-0dc8-4fd0-a219-91d3d01490b5@app.fastmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 3, 2026, at 16:31, Tom Lane wrote:
> "Joel Jacobson" <joel(at)compiler(dot)org> writes:
>> On Sun, Mar 1, 2026, at 22:12, Tom Lane wrote:
>>> Aside: you could argue that failing to consider stanullfrac is wrong,
>>> and maybe it is. But the more I looked at this code the more
>>> convinced I got that it was only partially accounting for nulls
>>> anyway. That seems like perhaps something to look into later.
>
>> How about adjusting estfract for the null fraction before clamping?
>
> This reminds me of the unfinished business at [1]. We really ought
> to make it true that nulls never get into the hash table before
> we assume that's so in costing. One of the things I was thinking
> was being overlooked is the possibility of lots of nulls bloating
> whichever hash bucket they get put in --- but if they aren't put
> into a bucket then it's not wrong to ignore them here.
>
> (Strictly speaking, that's still not so with non-strict hash operators,
> but those are so rare that I don't mind not accounting for them.)
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/3061845(dot)1746486714(at)sss(dot)pgh(dot)pa(dot)us
Hmm, OK, so there are cases when we don't discard NULLs when we should
be able to? I was reading these lines in nodeHash.c and thought we would
always be discarding them when possible:
if (!isnull)
{
...
}
else if (node->keep_null_tuples)
{
/* null join key, but we must save tuple to be emitted later */
...
}
/* else we can discard the tuple immediately */
Thanks for the pointer to [1], I will dig into that thread, exciting!
/Joel
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Florin Irion | 2026-03-03 17:33:40 | UBSAN crash in EventTriggerCollectAlterTSConfig (memcpy with NULL src) |
| Previous Message | David Geier | 2026-03-03 17:31:51 | Re: Reduce build times of pg_trgm GIN indexes |