Re: WAL Insertion Lock Improvements

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WAL Insertion Lock Improvements
Date: 2023-04-10 04:08:43
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 20, 2023 at 09:49:48PM -0800, Nathan Bossart wrote:
> I'm marking this as ready-for-committer. I think a couple of the comments
> could use some small adjustments, but that probably doesn't need to hold up
> this patch.

Apologies. I was planning to have a thorough look at this patch but
life got in the way and I have not been able to study what's happening
on this thread this close to the feature freeze.

Anyway, I am attaching two modules I have written for the sake of this
thread while beginning my lookup of the patch:
- lwlock_test.tar.gz, validation module for LWLocks with variable
waits. This module can be loaded with shared_preload_libraries to
have two LWLocks and two variables in shmem, then have 2 backends play
ping-pong with each other's locks. An isolation test may be possible,
though I have not thought hard about it. Just use a SQL sequence like
that, for example, with N > 1 (see README):
Backend 1: SELECT lwlock_test_acquire();
Backend 2: SELECT lwlock_test_wait(N);
Backend 1: SELECT lwlock_test_update(N);
Backend 1: SELECT lwlock_test_release();
- custom_wal.tar.gz, thin wrapper for LogLogicalMessage() able to
generate N records of size M bytes in a single SQL call. This can be
used to generate records of various sizes for benchmarking, limiting
the overhead of individual calls to pg_logical_emit_message_bytea().
I have begun gathering numbers with WAL records of various size and
length, using pgbench like:
$ cat script.sql
\set record_size 1
\set record_number 5000
SELECT custom_wal(:record_size, :record_number);
$ pgbench -n -c 500 -t 100 -f script.sql
So this limits most the overhead of behind parsing, planning, and most
of the INSERT logic.

I have been trying to get some reproducible numbers, but I think that
I am going to need a bigger maching than what I have been using for
the last few days, up to 400 connections. It is worth noting that
00d1e02b may influence a bit the results, so we may want to have more
numbers with that in place particularly with INSERTs, and one of the
tests used upthread uses single row INSERTs.

Another question I had: would it be worth having some tests with
pg_wal/ mounted to a tmpfs so as I/O would not be a bottleneck? It
should be instructive to get more measurement with a fixed number of
transactions and a rather high amount of concurrent connections (1k at
least?), where the contention would be on the variable waits. My
first impression is that records should not be too small if you want
to see more the effects of this patch, either.

Looking at the patch.. LWLockConflictsWithVar() and
LWLockReleaseClearVar() are the trivial bits. These are OK.

+ * NB: Note the use of pg_atomic_exchange_u64 as opposed to just
+ * pg_atomic_write_u64 to update the value. Since pg_atomic_exchange_u64 is
+ * a full barrier, we're guaranteed that the subsequent shared memory
+ * reads/writes, if any, happen after we reset the lock variable.

This mentions that the subsequent read/write operations are safe, so
this refers to anything happening after the variable is reset. As
a full barrier, should be also mention that this is also ordered with
respect to anything that the caller did before clearing the variable?
From this perspective using pg_atomic_exchange_u64() makes sense to me
in LWLockReleaseClearVar().

+ * XXX: Use of a spinlock at the beginning of this function to read
+ * current insert position implies memory ordering. That means that
+ * the immediate loads and stores to shared memory (for instance,
+ * in LWLockUpdateVar called via LWLockWaitForVar) don't need an
+ * explicit memory barrier as far as the current usage is
+ * concerned. But that might not be safe in general.
What's the part where this is not safe? Based on what I see, this
code path is safe because of the previous spinlock. This is the same
comment as at the beginning of LWLockConflictsWithVar(). Is that
something that we ought to document at the top of LWLockWaitForVar()
as well? We have one caller of this function currently, but there may
be more in the future.

- * you're about to write out.
+ * you're about to write out. Using an atomic variable for insertingAt avoids
+ * taking any explicit lock for reads and writes.

Hmm. Not sure that we need to comment at all.

-LWLockUpdateVar(LWLock *lock, uint64 *valptr, uint64 val)
+LWLockUpdateVar(LWLock *lock, pg_atomic_uint64 *valptr, uint64 val)
Assert(pg_atomic_read_u32(&lock->state) & LW_VAL_EXCLUSIVE);

- /* Update the lock's value */
- *valptr = val;

The sensitive change is in LWLockUpdateVar(). I am not completely
sure to understand this removal, though. Does that influence the case
where there are waiters?

Another thing I was wondering about: how much does the fast-path used
in LWLockUpdateVar() influence the performance numbers? Am I right to
guess that it counts for most of the gain seen? Or could it be that
the removal of the spin lock in
LWLockConflictsWithVar()/LWLockWaitForVar() the point that has the
highest effect?

Attachment Content-Type Size
lwlock_test.tar.gz application/gzip 3.1 KB
custom_wal.tar.gz application/gzip 1.1 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Andrea Gelmini 2023-04-10 07:04:22 Re: Direct I/O
Previous Message Amit Kapila 2023-04-10 04:08:13 Re: PGDOCS - function pg_get_publication_tables is not documented?