Re: Hash aggregate collisions cause excessive spilling

From: Ants Aasma <ants(dot)aasma(at)cybertec(dot)at>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(at)vondra(dot)me>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hash aggregate collisions cause excessive spilling
Date: 2026-02-19 17:06:04
Message-ID: CANwKhkNQjUwSR_2KPE0my1cYiMyFij_50Bmfdy0eWsy-5_p5WQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 19 Feb 2026 at 18:32, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Interestingly the plan doesn't have partial and final on those hash agg nodes:
> > ...
> > There are timescale tables involved in the plan, so I think timescale
> > might be behind that.
>
> Hm, so timescale creates a plan that we would not?

No, after poking around in the query, it's actually just that the
aggregate is for implementing DISTINCT, which means there is no
aggregate state.

> > I'm wondering if some way to decorrelate the hashtables would help.
> > For example a hashtable specific (pseudo)random salt.
>
> We do try to add a hash-IV that's different for each worker:
>
> /*
> * If parallelism is in use, even if the leader backend is performing the
> * scan itself, we don't want to create the hashtable exactly the same way
> * in all workers. As hashtables are iterated over in keyspace-order,
> * doing so in all processes in the same way is likely to lead to
> * "unbalanced" hashtables when the table size initially is
> * underestimated.
> */
> if (use_variable_hash_iv)
> hash_iv = murmurhash32(ParallelWorkerNumber);
>
>
> I don't remember enough of how the parallel aggregate stuff works. Perhaps the
> issue is that the leader is also building a hashtable and it's being inserted
> into the post-gather hashtable, using the same IV?
>
> In which case parallel_leader_participation=off should make a difference.

After turning leader participation off the problem no longer
reproduced even after 10 iterations, turning it back on it reproduced
on the 4th iteration. Is there any reason why the hash table couldn't
have an unconditional iv that includes the plan node?

Regards,
Ants Aasma

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-02-19 17:15:53 Re: pg_stat_io_histogram
Previous Message Dragos Andriciuc 2026-02-19 17:04:43 Re: DOCS - Add introductory paragraph to Getting Started chapter