| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Scott Carey <scott(dot)carey(at)algonomy(dot)com> |
| Cc: | Jeff Davis <pgsql(at)j-davis(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Date: | 2026-04-03 19:24:15 |
| Message-ID: | 3825075.1775244255@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
I wrote:
> So the main problem here is we're leaking the arrays made by
> array_agg, and a secondary problem is that that drives the
> cost of hash_agg_check_limits to an unacceptable level.
After further study I no longer think there's a leak. This
test query involves 50000 GROUP BY groups, and we have an
array being accumulated for each one. The difference between
the fast array_agg implementation and the slow one is just
that the fast one keeps all its working state in the aggregate
context, while the slow one makes a separate sub-context for
each "expanded array". v12 creates 50000 "expanded arrays"
too, but it's not noticeably slow.
So the problem is exactly that repeating hash_agg_check_limits
each time we start a new group is O(N^2), because if there is
a sub-context for each group then MemoryContextMemAllocated
requires O(N) time.
The other little problem with this approach is acknowledged in the
comments:
* ... Allocations may happen without adding new groups (for instance,
* if the transition state size grows), so this check is imperfect.
So really the whole thing is kind of unsatisfactory.
I'm not seeing cheap ways to make it better though.
A very quick and dirty idea is to tell MemoryContextMemAllocated
not to recurse, which would restore it to constant-time. But
that would make the results much less accurate in cases like
this one.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Davis | 2026-04-03 19:24:49 | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Previous Message | Pavel Stehule | 2026-04-03 19:10:13 | Re: proposal: schema variables |