Re: LogwrtResult contended spinlock

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>
Subject: Re: LogwrtResult contended spinlock
Date: 2021-01-30 02:30:11
Message-ID: 20210130023011.n545o54j65t4kgxn@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-01-29 22:47:56 -0300, Alvaro Herrera wrote:
> So I tried this, but -- perhaps not suprisingly -- I can't get it to
> work properly; the synchronization fails.

What do you mean by "synchronization fails"?

> Strangely, all tests work for me, but the pg_upgrade one in particular
> fails.

It's one of the few tests using fsync=on.

> +static inline void
> +pg_atomic_monotonic_advance_u64(volatile pg_atomic_uint64 *ptr, uint64 target_)
> +{
> +#ifndef PG_HAVE_ATOMIC_U64_SIMULATION
> + AssertPointerAlignment(ptr, 8);
> +#endif
> + /* FIXME is this algorithm correct if we have u64 simulation? */

I don't see a problem.

> + while (true)
> + {
> + uint64 currval;
> +
> + currval = pg_atomic_read_u64(ptr);
> + if (currval > target_)
> + break; /* already done by somebody else */
> + if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
> + break; /* successfully done */
> + }
> +}

I suggest writing this as

currval = pg_atomic_read_u64(ptr);
while (currval < target_)
{
if (pg_atomic_compare_exchange_u64(ptr, &currval, target_))
break;
}

> /*
> * Inserting to WAL is protected by a small fixed number of WAL insertion
> @@ -596,8 +599,10 @@ typedef struct XLogCtlData
> {
> XLogCtlInsert Insert;
>
> + XLogwrtAtomic LogwrtRqst;
> + XLogwrtAtomic LogwrtResult;
> +
> /* Protected by info_lck: */
> - XLogwrtRqst LogwrtRqst;

Not sure putting these into the same cacheline is a good idea.

> @@ -2166,12 +2163,11 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, bool opportunistic)
> if (opportunistic)
> break;
>
> - /* Before waiting, get info_lck and update LogwrtResult */
> - SpinLockAcquire(&XLogCtl->info_lck);
> - if (XLogCtl->LogwrtRqst.Write < OldPageRqstPtr)
> - XLogCtl->LogwrtRqst.Write = OldPageRqstPtr;
> - LogwrtResult = XLogCtl->LogwrtResult;
> - SpinLockRelease(&XLogCtl->info_lck);
> + /* Before waiting, update LogwrtResult */
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, OldPageRqstPtr);
> +
> + LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> + LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);

I don't think it's quite as easy as this. Write/Flush now aren't
guaranteed to be coherent with each other - previously they were. And
because it's in a global variable used everywhere, we can't just be more
careful about update protocols in one place...

We also shouldn't re-read a variable that we just did via the
pg_atomic_monotonic_advance_u64().

I think we should stop updating both the Write/Flush position at the
same time. That way we don't have an expectation of them being coherent
with each other. Most places really don't need both anyway.

> {
> - SpinLockAcquire(&XLogCtl->info_lck);
> - XLogCtl->LogwrtResult = LogwrtResult;
> - if (XLogCtl->LogwrtRqst.Write < LogwrtResult.Write)
> - XLogCtl->LogwrtRqst.Write = LogwrtResult.Write;
> - if (XLogCtl->LogwrtRqst.Flush < LogwrtResult.Flush)
> - XLogCtl->LogwrtRqst.Flush = LogwrtResult.Flush;
> - SpinLockRelease(&XLogCtl->info_lck);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtResult.Flush, LogwrtResult.Flush);
> +
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, LogwrtResult.Write);
> + pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Flush, LogwrtResult.Flush);

Hm. We went from one cheap atomic operation (SpinLockAcquire) to four
expensive ones in the happy path. That's not free...

I don't think we need to manipulate XLogCtl->LogwrtResult.* using atomic
ops - they can only be updated with WALWriteLock held, right?

XLogCtl->LogwrtResult was updated with plain assignment before, why did
you change it to pg_atomic_monotonic_advance_u64()?

> @@ -1169,6 +1169,8 @@ XLogInsertRecord(XLogRecData *rdata,
> {
> /* advance global request to include new block(s) */
> pg_atomic_monotonic_advance_u64(&XLogCtl->LogwrtRqst.Write, EndPos);
> + pg_memory_barrier();
> +

That's not really useful - the path that actually updates already
implies a barrier. It'd probably be better to add a barrier to a "never
executed cmpxchg" fastpath.

> @@ -2905,6 +2909,7 @@ XLogFlush(XLogRecPtr record)
> WriteRqstPtr = pg_atomic_read_u64(&XLogCtl->LogwrtRqst.Write);
> LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
> LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Flush);
> + pg_read_barrier();

I'm not sure you really can get away with just a read barrier in these
places. We can't e.g. have later updates to other shared memory
variables be "moved" to before the barrier (which a read barrier
allows).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-01-30 02:33:36 Re: Should we make Bitmapsets a kind of Node?
Previous Message Michael Paquier 2021-01-30 02:23:01 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly