Quick Links

Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture

From:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Salvatore Dipietro <dipietro(dot)salvatore(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Salvatore Dipietro <dipiets(at)amazon(dot)com>, blakgeof(at)amazon(dot)com
Subject:	Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
Date:	2025-05-01 19:59:09
Message-ID:	aBPSjaY3Oki0pxHZ@nathan
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, May 01, 2025 at 02:48:59PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Apr 30, 2025 at 4:53 AM Salvatore Dipietro
>> <dipietro(dot)salvatore(at)gmail(dot)com> wrote:
>>> we would like to propose the removal of the Instruction
>>> Synchronization Barrier (isb) for aarch64 architectures. Based on our
>>> testing on Graviton instances (m7g.16xlarge), we can see on average
>>> over multiple iterations up to 12% better performance using PGBench
>>> select-only and up to 9% with Sysbench oltp_read_only workloads. On
>>> Graviton4 (m8g.24xlarge) results are up to 8% better using PGBench
>>> select-only and up to 6% with Sysbench oltp_read_only workloads.
>>> We have also tested it putting more pressure on the spin_delay
>>> function, enabling pg_stat_statements.track_planning with PGBench
>>> read-only [0] and, on average, the patch shows up to 27% better
>>> performance on m6g.16xlarge and up to 37% on m7g.16xlarge.
>
>> I think you should make some kind of argument about why the previous
>> conclusion was wrong, or why something's changed between then and now.
>
> TBH, those numbers are large enough that I flat out don't believe
> them. As noted in the previous thread, we've managed to squeeze out
> a lot of our dependencies on spinlock performance, via algorithmic
> changes, migration to atomic ops, etc. So I think ten-percent-ish
> improvement on a plain pgbench test just isn't very plausible.
> We certainly didn't see that kind of effect when we were doing that
> earlier round of testing --- we had to use a custom testing lashup
> to get numbers that were outside the noise at all.

I'm not sure there's actually any hot spinlock involved in a regular
select-only pgbench (unless perhaps pg_stat_statements was enabled or
something). But I do know pg_stat_statements.track_planning stresses the
spinlock code quite a bit, and commit 3d0b4b1 recently added a non-locking
initial test in AArch64's TAS_SPIN, so I wonder if the ISB is still
appropriate. It'd be interesting to see the performance difference of
removing the ISB with and without commit 3d0b4b1 applied.

--
nathan

In response to

Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture at 2025-05-01 18:48:59 from Tom Lane

Responses

Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture at 2025-05-01 20:08:06 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2025-05-01 20:08:06	Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
Previous Message	Dmitry Dolgov	2025-05-01 19:55:47	Re: queryId constant squashing does not support prepared statements