| From: | Scott Carey <scott(dot)carey(at)algonomy(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Date: | 2026-04-02 19:08:13 |
| Message-ID: | CA+vubOH1mcEbsHnkm5L7aJWR+eeiVpOnGfcdWax9td6pyOkMww@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
On Thu, Apr 2, 2026, 10:38 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I did some bisecting using the attached simplified test case, and found
> that the query execution time jumps from circa 60ms to circa 7500ms here:
>
> 1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit
> commit 1f39bce021540fde00990af55b4432c55ef4b3c7
> Author: Jeff Davis <jdavis(at)postgresql(dot)org>
> Date: Wed Mar 18 15:42:02 2020 -0700
>
> Disk-based Hash Aggregation.
>
> While performing hash aggregation, track memory usage when adding new
> groups to a hash table. If the memory usage exceeds work_mem, enter
> "spill mode".
>
> (Times quoted are on a Mac M4 Pro, but in assert-enabled builds so
> maybe not directly comparable to production.)
>
> I'm bemused as to why: the test case has work_mem set high enough that
> we shouldn't be triggering spill mode, so why did this change affect
> it at all?
>
Even stranger, the more spills induced via smaller work_mem the faster it
goes.
This suggests something getting more expensive as the hash table gets
larger. Significantly more, like O(n^2) or worse.
I wonder if it is the size of the hash table itself (entry count) or the
size of the entries? Does a table with one row matching each entry have
the problem or only when the hash bucket is hit multiple times and values
aggregated? Why is the reported size used so much larger with the custom
function?
I have some experiments in mind that could answer some of these.
Tracking hash table memory usage dynamically can be tricky. I would
imagine that user defined aggregates make it more difficult.
> regards, tom lane
>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-04-02 22:03:13 | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Previous Message | Tom Lane | 2026-04-02 17:38:18 | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |