Re: Hash aggregate collisions cause excessive spilling

From: Andres Freund <andres(at)anarazel(dot)de>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash aggregate collisions cause excessive spilling
Date: 2026-02-19 19:01:15
Message-ID: 5pbulw5plv5owcg5iothxfg5xf6vj6ayfltppsg573kllpnx24@vd2iaqxnkasy
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-02-19 20:24:17 +0200, Ants Aasma wrote:
> On Thu, 19 Feb 2026 at 20:07, Ants Aasma <ants(dot)aasma(at)cybertec(dot)at> wrote:
> > I was thinking more along the lines of hashing together the pointer
> > value and worker number. But something more deterministic would indeed
> > be better. How about this?
> >
> > --- a/src/backend/executor/execGrouping.c
> > +++ b/src/backend/executor/execGrouping.c
> > @@ -201,3 +201,3 @@ BuildTupleHashTable(PlanState *parent,
> > MemoryContext oldcontext;
> > - uint32 hash_iv = 0;
> > + uint32 hash_iv = parent->plan->plan_node_id;
>
> I can confirm that this fixes the issue. A standalone reproducer is here:

I think this needs should use something that smears the bits from the plan_id
more widely. The hash functions of some types are ... peculiar (partially due
to cross-type hashjoin support).

I'd also combine it with use_variable_hash_iv, rather than just have
use_variable_hash_iv overwrite it.

> create table data as select random(1,1000000) from generate_series(1,10000000);
> vacuum analyze data;
> set enable_gathermerge = off;
> explain analyze select distinct random from data;

Took me a moment to reproduce with that. I also needed parallel_tuple_cost=0
and work_mem=4MB (something fairly small at least) to reproduce it.

With those I can reproduce that the query only terminates in a reasonable time
with the hash_iv = parent->plan->plan_node_id thing.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-02-19 19:19:33 Re: assume availability of "inline" keyword
Previous Message Andres Freund 2026-02-19 18:27:10 Re: Adding locks statistics