From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Salvatore Dipietro <dipietro(dot)salvatore(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Salvatore Dipietro <dipiets(at)amazon(dot)com>, blakgeof(at)amazon(dot)com |
Subject: | Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture |
Date: | 2025-05-01 19:59:09 |
Message-ID: | aBPSjaY3Oki0pxHZ@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, May 01, 2025 at 02:48:59PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Apr 30, 2025 at 4:53 AM Salvatore Dipietro
>> <dipietro(dot)salvatore(at)gmail(dot)com> wrote:
>>> we would like to propose the removal of the Instruction
>>> Synchronization Barrier (isb) for aarch64 architectures. Based on our
>>> testing on Graviton instances (m7g.16xlarge), we can see on average
>>> over multiple iterations up to 12% better performance using PGBench
>>> select-only and up to 9% with Sysbench oltp_read_only workloads. On
>>> Graviton4 (m8g.24xlarge) results are up to 8% better using PGBench
>>> select-only and up to 6% with Sysbench oltp_read_only workloads.
>>> We have also tested it putting more pressure on the spin_delay
>>> function, enabling pg_stat_statements.track_planning with PGBench
>>> read-only [0] and, on average, the patch shows up to 27% better
>>> performance on m6g.16xlarge and up to 37% on m7g.16xlarge.
>
>> I think you should make some kind of argument about why the previous
>> conclusion was wrong, or why something's changed between then and now.
>
> TBH, those numbers are large enough that I flat out don't believe
> them. As noted in the previous thread, we've managed to squeeze out
> a lot of our dependencies on spinlock performance, via algorithmic
> changes, migration to atomic ops, etc. So I think ten-percent-ish
> improvement on a plain pgbench test just isn't very plausible.
> We certainly didn't see that kind of effect when we were doing that
> earlier round of testing --- we had to use a custom testing lashup
> to get numbers that were outside the noise at all.
I'm not sure there's actually any hot spinlock involved in a regular
select-only pgbench (unless perhaps pg_stat_statements was enabled or
something). But I do know pg_stat_statements.track_planning stresses the
spinlock code quite a bit, and commit 3d0b4b1 recently added a non-locking
initial test in AArch64's TAS_SPIN, so I wonder if the ISB is still
appropriate. It'd be interesting to see the performance difference of
removing the ISB with and without commit 3d0b4b1 applied.
--
nathan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2025-05-01 20:08:06 | Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture |
Previous Message | Dmitry Dolgov | 2025-05-01 19:55:47 | Re: queryId constant squashing does not support prepared statements |