Quick Links

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

From:	Scott Carey <scott(dot)carey(at)algonomy(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Date:	2026-04-02 19:08:13
Message-ID:	CA+vubOH1mcEbsHnkm5L7aJWR+eeiVpOnGfcdWax9td6pyOkMww@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Thu, Apr 2, 2026, 10:38 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> I did some bisecting using the attached simplified test case, and found
> that the query execution time jumps from circa 60ms to circa 7500ms here:
>
> 1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit
> commit 1f39bce021540fde00990af55b4432c55ef4b3c7
> Author: Jeff Davis <jdavis(at)postgresql(dot)org>
> Date: Wed Mar 18 15:42:02 2020 -0700
>
> Disk-based Hash Aggregation.
>
> While performing hash aggregation, track memory usage when adding new
> groups to a hash table. If the memory usage exceeds work_mem, enter
> "spill mode".
>
> (Times quoted are on a Mac M4 Pro, but in assert-enabled builds so
> maybe not directly comparable to production.)
>
> I'm bemused as to why: the test case has work_mem set high enough that
> we shouldn't be triggering spill mode, so why did this change affect
> it at all?
>

Even stranger, the more spills induced via smaller work_mem the faster it
goes.

This suggests something getting more expensive as the hash table gets
larger. Significantly more, like O(n^2) or worse.

I wonder if it is the size of the hash table itself (entry count) or the
size of the entries? Does a table with one row matching each entry have
the problem or only when the hash bucket is hit multiple times and values
aggregated? Why is the reported size used so much larger with the custom
function?

I have some experiments in mind that could answer some of these.

Tracking hash table memory usage dynamically can be tricky. I would
imagine that user defined aggregates make it more difficult.

> regards, tom lane
>
>

In response to

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 at 2026-04-02 17:38:18 from Tom Lane

Responses

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 at 2026-04-02 22:03:13 from Tom Lane

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2026-04-02 22:03:13	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Previous Message	Tom Lane	2026-04-02 17:38:18	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17