From: | Salvatore Dipietro <dipietro(dot)salvatore(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Salvatore Dipietro <dipiets(at)amazon(dot)com>, blakgeof(at)amazon(dot)com |
Subject: | Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture |
Date: | 2025-06-19 19:10:52 |
Message-ID: | CAGnuAhW_ZrMjqk3-ZSREjwr8X7vTzy6JpvFdLevY_ZBRQWtFWQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, 19 May 2025 at 09:38, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> Could you retry your tests on v18devel? It might also be useful to
repeat the tests on a variety of hardware to ensure
> it's a win across the board.
Hi Nathan,
Thanks for your clarification. As you requested, I have performed more
tests on different instance types and sizes.
In particular, I have run the `test_shm_mq_pipelined` benchmark using
Ubuntu 22.04 on m7g.[2,8,16]xlarge and
c8g.[2,8.24]xlarge instances with PG master branch (commit:
84914e964b4). Each test has been repeated 30 times
and here is the average (in seconds) and the difference from baseline (Master).
Graviton3 instances (m7g) results:
| Concurrency | Loops | 2xl Master | 2xl No-ISB | 8xl Master |
8xl No-ISB | 16xl Master | 16xl No-ISB |
|-------------|----------|------------|--------------|------------|--------------|-------------|----------------|
| 1 | 10000000 | 1.9s | 1.9s (1.00x) | 1.9s |
1.8s (1.06x) | 1.7s | 1.8s (0.94x) |
| 2 | 10000000 | 2.4s | 2.4s (1.00x) | 2.5s |
2.3s (1.09x) | 2.3s | 2.3s (1.00x) |
| 4 | 10000000 | 3.8s | 3.8s (1.00x) | 3.8s |
3.6s (1.06x) | 3.5s | 3.8s (0.92x) |
| 8 | 10000000 | 8.9s | 10.3s (0.86x)| 7.5s |
8.6s (0.87x) | 7.8s | 8.9s (0.88x) |
| 16 | 10000000 | 21.6s | 22.5s (0.96x)| 22.5s |
23.6s (0.95x)| 21.4s | 24.9s (0.86x) |
| 32 | 10000000 | 42.8s | 41.3s (1.04x)| 114.7s |
52.0s (2.21x)| 88.6s | 49.9s (1.78x) |
| 64 | 10000000 | 81.8s | 73.3s (1.12x)| 395.9s |
85.2s (4.65x)| 381.3s | 97.0s (3.93x) |
| 32 | 100000 | 0.4s | 0.4s (1.00x) | 1.1s |
0.5s (2.20x) | 1.1s | 0.6s (1.83x) |
| 64 | 100000 | 0.8s | 0.8s (1.00x) | 3.9s |
0.9s (4.33x) | 3.9s | 1.1s (3.55x) |
| 128 | 100000 | 1.6s | 1.5s (1.07x) | 8.5s |
1.9s (4.47x) | 13.3s | 2.0s (6.65x) |
| 256 | 100000 | 3.2s | 3.1s (1.03x) | 19.8s |
4.0s (4.95x) | 35.9s | 4.1s (8.76x) |
Graviton4 instances (c8g) results:
| Concurrency | Loops | 2xl Master | 2xl No-ISB | 8xl Master |
8xl No-ISB | 24xl Master | 24xl No-ISB |
|-------------|----------|------------|---------------|------------|---------------|-------------|----------------|
| 1 | 10000000 | 1.7s | 1.6s (1.06x) | 1.6s |
1.6s (1.00x) | 1.6s | 1.5s (1.07x) |
| 2 | 10000000 | 2.2s | 2.2s (1.00x) | 2.2s |
2.2s (1.00x) | 2.2s | 2.1s (1.05x) |
| 4 | 10000000 | 3.4s | 3.5s (0.97x) | 3.5s |
3.4s (1.03x) | 3.5s | 3.4s (1.03x) |
| 8 | 10000000 | 10.9s | 13.9s (0.78x) | 8.2s |
9.4s (0.87x) | 7.8s | 8.2s (0.95x) |
| 16 | 10000000 | 23.6s | 27.0s (0.87x) | 26.3s |
26.1s (1.01x) | 27.1s | 28.1s (0.96x) |
| 32 | 10000000 | 44.6s | 46.9s (0.95x) | 60.6s |
47.7s (1.27x) | 62.1s | 50.4s (1.23x) |
| 64 | 10000000 | 81.4s | 81.5s (1.00x) | 189.4s |
91.5s (2.07x) | 176.9s | 101.3s (1.75x) |
| 32 | 100000 | 0.5s | 0.5s (1.00x) | 0.6s |
0.5s (1.20x) | 0.6s | 0.5s (1.20x) |
| 64 | 100000 | 0.8s | 0.8s (1.00x) | 1.7s |
0.9s (1.89x) | 2.1s | 1.2s (1.75x) |
| 128 | 100000 | 1.5s | 1.6s (0.94x) | 4.5s |
1.9s (2.37x) | 7.8s | 2.1s (3.71x) |
| 256 | 100000 | 3.3s | 3.1s (1.06x) | 9.7s |
4.1s (2.37x) | 22.0s | 4.5s (4.89x) |
We can notice that with low concurrency (1,2,4) results are similar
while with medium concurrency (8,16)
the No-ISB approach can introduce some regression especially on
smaller instances. However, we can see some significant
positive performance impact with high concurrency (>=32) settings on
large instances (up to 8.76x on m7g.16xl with 256 concurrency).
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Jones | 2025-06-19 19:24:32 | Re: libxml2 author overwhelmed with security requests |
Previous Message | Paul Jungwirth | 2025-06-19 19:09:47 | Re: Correct docs about partitions and EXCLUDE constraints |