| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Adrian Mönnich <adrian(dot)moennich(at)cern(dot)ch>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Tomas Vondra <tv(at)fuzzy(dot)cz> |
| Subject: | Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours) |
| Date: | 2026-04-16 18:52:53 |
| Message-ID: | 298af5cb-5547-4c95-b988-7be0617b17b5@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On 4/16/26 07:25, Thomas Munro wrote:
> On Sun, Apr 5, 2026 at 2:45 AM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>> At this point I was suspecting the data distributions for the join
>> columns may be somewhat weird, causing issues for the hashjoin batching.
>> For events.contributions.id it's perfectly fine - it's entirely unique,
>> with each ID having 1 entry. Unsurprisingly, because it's the PK. But
>> for attachments.folders.contribution_id I see this:
>>
>> SELECT contribution_id, count(*) FROM attachments.folders
>> GROUP BY contribution_id ORDER BY 2 DESC;
>>
>> contribution_id | count
>> -----------------+--------
>> | 464515
>> 5492978 | 67
>> 4117499 | 42
>> 4045045 | 41
>> ...
>>
>> So there's ~500k entries with NULL, that can't possibly match to
>> anything (right)? I assume we still add them to the hash, though.
>
> That's also the conditions required to prevent the
> "stop-partitioning-it's-not-working" logic from triggering. That
> thing where we know we need to pick a better lower than 100%. But
> what?
>
> Did this commit help?
>
> commit 1811f1af98fb237fdd5adb588cd4b57c433b75f8
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Thu Mar 19 15:21:36 2026 -0400
>
> Improve hash join's handling of tuples with null join keys.
Possibly. With the original (simplified) query, I get no failures with
current master. And it starts failing after I revert 1811f1af98.
With the alternative queries (with IS NOT NULL), it seems to work OK
even after the revert. So maybe the queries are not failing for the same
reason?
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | surya poondla | 2026-04-16 23:20:46 | Re: BUG #19382: Server crash at __nss_database_lookup |
| Previous Message | Masahiko Sawada | 2026-04-16 17:58:40 | Re: TRAP: failed Assert("offsets[i] > offsets[i - 1]"), File: "tidstore.c" |