Re: Hash aggregate collisions cause excessive spilling

From: Andres Freund <andres(at)anarazel(dot)de>
To: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash aggregate collisions cause excessive spilling
Date: 2026-02-19 17:30:07
Message-ID: vx4azu62rgrnkt4oauviepbydxj5q7wbtzycwmqnmby2sfpvwc@xfvp3pcjnv2w
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-02-19 19:06:04 +0200, Ants Aasma wrote:
> >
> > /*
> > * If parallelism is in use, even if the leader backend is performing the
> > * scan itself, we don't want to create the hashtable exactly the same way
> > * in all workers. As hashtables are iterated over in keyspace-order,
> > * doing so in all processes in the same way is likely to lead to
> > * "unbalanced" hashtables when the table size initially is
> > * underestimated.
> > */
> > if (use_variable_hash_iv)
> > hash_iv = murmurhash32(ParallelWorkerNumber);
> >
> >
> > I don't remember enough of how the parallel aggregate stuff works. Perhaps the
> > issue is that the leader is also building a hashtable and it's being inserted
> > into the post-gather hashtable, using the same IV?
> >
> > In which case parallel_leader_participation=off should make a difference.
>
> After turning leader participation off the problem no longer
> reproduced even after 10 iterations, turning it back on it reproduced
> on the 4th iteration. Is there any reason why the hash table couldn't
> have an unconditional iv that includes the plan node?

You mean just use the numerical value of the pointer? I think that'd be pretty
likely to be the same between parallel workers. And I think it's not great for
benchmarking / debugging if every run ends up with a different IV.

But we certainly should do something about the IV for the leader in these
cases.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2026-02-19 17:44:57 Add ssl_(supported|shared)_groups to sslinfo
Previous Message Nathan Bossart 2026-02-19 17:20:44 assume availability of "inline" keyword