| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Hash aggregate collisions cause excessive spilling |
| Date: | 2026-02-19 19:01:15 |
| Message-ID: | 5pbulw5plv5owcg5iothxfg5xf6vj6ayfltppsg573kllpnx24@vd2iaqxnkasy |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-02-19 20:24:17 +0200, Ants Aasma wrote:
> On Thu, 19 Feb 2026 at 20:07, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> wrote:
> > I was thinking more along the lines of hashing together the pointer
> > value and worker number. But something more deterministic would indeed
> > be better. How about this?
> >
> > --- a/src/backend/executor/execGrouping.c
> > +++ b/src/backend/executor/execGrouping.c
> > @@ -201,3 +201,3 @@ BuildTupleHashTable(PlanState *parent,
> > MemoryContext oldcontext;
> > - uint32 hash_iv = 0;
> > + uint32 hash_iv = parent->plan->plan_node_id;
>
> I can confirm that this fixes the issue. A standalone reproducer is here:
I think this needs should use something that smears the bits from the plan_id
more widely. The hash functions of some types are ... peculiar (partially due
to cross-type hashjoin support).
I'd also combine it with use_variable_hash_iv, rather than just have
use_variable_hash_iv overwrite it.
> create table data as select random(1,1000000) from generate_series(1,10000000);
> vacuum analyze data;
> set enable_gathermerge = off;
> explain analyze select distinct random from data;
Took me a moment to reproduce with that. I also needed parallel_tuple_cost=0
and work_mem=4MB (something fairly small at least) to reproduce it.
With those I can reproduce that the query only terminates in a reasonable time
with the hash_iv = parent->plan->plan_node_id thing.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-02-19 19:19:33 | Re: assume availability of "inline" keyword |
| Previous Message | Andres Freund | 2026-02-19 18:27:10 | Re: Adding locks statistics |