| From: | David Rowley <dgrowleyml(at)gmail(dot)com> |
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Cc: | Scott Carey <scott(dot)carey(at)algonomy(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Date: | 2026-03-31 22:48:55 |
| Message-ID: | CAApHDvrngfAkGwixvd0nrR+Ya6XfjiXLRDqjy=h=4jkr2ME8jQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
On Wed, 1 Apr 2026 at 08:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Some thoughts:
>
> * Does it repro without the "vector" extension? Seems unlikely that
> that is related, but we're at the grasping-at-straws stage.
>
> * More grasping at straws: is this stock community Postgres, or
> some vendor's modification (eg RDS or Aurora)?
>
> * It would be worth doing the EXPLAINs with the SETTINGS option,
> just to make sure that there's not some non-default setting you
> forgot to mention.
Also grasping at straws and wondering if it's related to L3
contention. Hash tables mostly always have very unpredictable memory
access which the hardware prefetcher can't deal with. If useful
cachelines are being evicted from L3 by other processes, then that'll
mean more stalls waiting on RAM when probing the hash table.
I tried to see if I could recreate this on a 64 physical core machine,
and I can, but to nowhere near the same extent as what Scott showed.
work_mem = 200MB
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared postgres
| grep latency
latency average = 63.556 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 10 -j
10 postgres | grep latency
latency average = 66.002 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 30 -j
30 postgres | grep latency
latency average = 83.188 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 64 -j
64 postgres | grep latency
latency average = 168.449 ms
64 thread is 2.65x slower than 1.
work_mem = 10MB
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared postgres
| grep latency
latency average = 95.239 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 10 -j
10 postgres | grep latency
latency average = 101.870 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 30 -j
30 postgres | grep latency
latency average = 114.402 ms
drowley(at)amd3990x:~$ pgbench -n -f bench.sql -T 10 -M prepared -c 64 -j
64 postgres | grep latency
latency average = 161.147 ms
64 thread is 1.69x slower than 1. So, the slowdown is bigger when the
system is under more memory pressure.
I'm curious to know how consistent the run times are and if the
json_agg() query can be just as slow as the array_agg() one. Could it
be that the json_agg() version was just run at a time the server
wasn't as busy with other things... ?
David
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Scott Carey | 2026-04-01 07:04:49 | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Previous Message | Tom Lane | 2026-03-31 19:26:09 | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |