Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Scott Carey <scott(dot)carey(at)algonomy(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Date: 2026-03-31 22:48:55
Message-ID: CAApHDvrngfAkGwixvd0nrR+Ya6XfjiXLRDqjy=h=4jkr2ME8jQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 1 Apr 2026 at 08:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Some thoughts:
>
> * Does it repro without the "vector" extension? Seems unlikely that
> that is related, but we're at the grasping-at-straws stage.
>
> * More grasping at straws: is this stock community Postgres, or
> some vendor's modification (eg RDS or Aurora)?
>
> * It would be worth doing the EXPLAINs with the SETTINGS option,
> just to make sure that there's not some non-default setting you
> forgot to mention.

Also grasping at straws and wondering if it's related to L3
contention. Hash tables mostly always have very unpredictable memory
access which the hardware prefetcher can't deal with. If useful
cachelines are being evicted from L3 by other processes, then that'll
mean more stalls waiting on RAM when probing the hash table.

I tried to see if I could recreate this on a 64 physical core machine,
and I can, but to nowhere near the same extent as what Scott showed.

work_mem = 200MB

drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared postgres
| grep latency
latency average = 63.556 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 10 -j
10 postgres | grep latency
latency average = 66.002 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 30 -j
30 postgres | grep latency
latency average = 83.188 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 64 -j
64 postgres | grep latency
latency average = 168.449 ms

64 thread is 2.65x slower than 1.

work_mem = 10MB

drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared postgres
| grep latency
latency average = 95.239 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 10 -j
10 postgres | grep latency
latency average = 101.870 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 30 -j
30 postgres | grep latency
latency average = 114.402 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 64 -j
64 postgres | grep latency
latency average = 161.147 ms

64 thread is 1.69x slower than 1. So, the slowdown is bigger when the
system is under more memory pressure.

I'm curious to know how consistent the run times are and if the
json_agg() query can be just as slow as the array_agg() one. Could it
be that the json_agg() version was just run at a time the server
wasn't as busy with other things... ?

David

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Carey 2026-04-01 07:04:49 Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Previous Message Tom Lane 2026-03-31 19:26:09 Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17