Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)

From: Olga Antonova <o(dot)antonova(at)postgrespro(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Cc: Sergey Shinderuk <s(dot)shinderuk(at)postgrespro(dot)ru>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)
Date: 2025-08-22 07:40:49
Message-ID: db4aca5c-c22b-4eb5-850d-212768f4fcac@postgrespro.ru
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 7/16/25 18:54, Andres Freund wrote:
> That was not in reply to the changed patch, but about the performance numbers
> you relayed. We had no repro, and even with the repro that Sergey has now
> delivered, we don't see similar levels of what you reported as contention.
We investigated this issue in detail and were able to reproduce the
spinlock contention in SIGetDataEntries. The problem is most evident on
multiprocessor systems with multiple NUMA nodes, but it also occurs on a
single node, albeit less pronounced. This is probably also the case for
high-frequency CPU.

We ran tests on two bare-metal servers:

4 NUMA nodes × 24 CPUs Intel(R) Xeon(R) Gold 6348H CPU @ 2.30GHz.
PostgreSQL was running on 3 nodes (72 CPUs).

2 NUMA nodes × 32 CPUs Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz.
PostgreSQL was running on a single node (32 CPUs).

and two PostgreSQL builds: from master branch and the with the patch
v5-0001-Read-Write-optimistic-spin-lock.patch.

To generate frequent cache invalidations, we executed a background
workload that repeatedly created and dropped temporary tables with
indexes in a loop.

do $$
begin
   for i in 1..1000000 loop
     create temp table tt1 (
    f0 bigserial primary key,
    f1 int,
    f2 int,
        f3 int,
        f4 int,
        f5 int,
        f6 int,
        f7 int,
        f8 int,
        f9 int,
        f10 int);
     CREATE INDEX ON tt1(f1);
     CREATE INDEX ON tt1(f2);
     CREATE INDEX ON tt1(f3);
     CREATE INDEX ON tt1(f4);
     CREATE INDEX ON tt1(f5);
     CREATE INDEX ON tt1(f6);
     CREATE INDEX ON tt1(f7);
     CREATE INDEX ON tt1(f8);
     CREATE INDEX ON tt1(f9);
     CREATE INDEX ON tt1(f10);
     drop table tt1;
     commit;
   end loop;
end;
$$;

As a benchmark, we used a pgbench select-only scenario with 64 clients:

pgbench -U postgres -c 64 -j 32 -T 200 -s 100 -M prepared -b select-only
postgres -n

For convenience, the test is included as test.sh (attached), with
description and setup instructions provided in the README.

During the test, we ran perf for 10 seconds using the command

perf record -F 99 -a -g --call-graph=dwarf -o perf_data sleep 10.

Аnd then generated flame graphs from the collected data

1. Three NUMA nodes (72 CPUs)

According to the flame graph (fg_3numa_nopatch.xml), about 34% of
exec_bind_message is spent in SIGetDataEntries, >90% of which is
spinlock wait (see fg_3numa_nopatch.xml).

With the patch the share of SIGetDataEntries decreases to ~6.6%, the
main waiting shifts to LWLockAcquire, and RWOptSpinReadStart accounts
for only ~1.1% (fg_3numa_patch.xml). TPS improvement: +6–8% (over 5 runs).

Without patch: TPS = 731171.336542
With patch: TPS = 786077.155196

2. Single NUMA node (32 CPUs)

In this case the problem is less pronounced, but still SIGetDataEntries
takes 10.1% of exec_bind_message, of which 82.3% is spinlock wait
(fg_1numa_nopatch.xml).

With the patch we observed a stable 1.5–2% TPS increase (5 runs).

Without patch: TPS = 518941.051825
With patch: TPS = 528768.641836

The flame graph does not show absolute time, but the relative
distribution confirms contention on the spinlock in SIGetDataEntries.
The problem exists and is a bottleneck under high load, especially on
multiprocessor NUMA systems. The patch mitigates this contention and
improves performance.

---
Best regards,
Olga Antonova

Attachment Content-Type Size
test.sh application/x-shellscript 1.3 KB
README text/plain 943 bytes
fg_3numa_patch.xml text/xml 651.1 KB
fg_3numa_nopatch.xml text/xml 662.6 KB
fg_1numa_nopatch.xml text/xml 718.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dominique Devienne 2025-08-22 07:52:06 Re: Identifying function-lookup failures due to argument name mismatches
Previous Message Peter Smith 2025-08-22 07:10:32 Re: Add support for specifying tables in pg_createsubscriber.