Re: LogwrtResult contended spinlock

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Subject: Re: LogwrtResult contended spinlock
Date: 2024-02-21 20:24:42
Message-ID: b43615437ac7d7fdef86a36e5d5bf3fc049bc11b.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2024-02-21 at 20:00 +0530, Bharath Rupireddy wrote:
> The local copy of LogwrtResult is so frequently used in the backends,
> if we were to replace it with atomic accesses, won't the atomic reads
> be costly and start showing up in perf profiles?

I don't see exactly where the extra cost would come from. What is your
concern?

In functions that access it several times, it may make sense to copy it
to a function-local variable, of course. But having a global non-shared
LogwrtResult doesn't seem particularly useful to me.

> In any case, I prefer
> discussing this separately once we get to a conclusion on converting
> shared memory Write and Flush ptrs to atomics.

I agree that it shouldn't block the earlier patches, but it seems on-
topic for this thread.

>
> 1. I guess we need to initialize the new atomics with
> pg_atomic_init_u64 initially in XLOGShmemInit:

Thank you.

> 2. I guess we need to update both the Write and Flush local copies in
> AdvanceXLInsertBuffer.

I agree. Whenever we update the non-shared LogwrtResult, let's update
the whole thing. Otherwise it's confusing.

> 3.
> @@ -6371,7 +6373,9 @@ GetFlushRecPtr(TimeLineID *insertTLI)
>  {
>      Assert(XLogCtl->SharedRecoveryState == RECOVERY_STATE_DONE);
>
> +    pg_read_barrier();
>      LogwrtResult.Flush = pg_atomic_read_u64(&XLogCtl-
> >LogwrtResult.Flush);
> +    LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl-
> >LogwrtResult.Write);
>
> Do we need a read barrier here to not reorder things when more than
> one process is accessing the flush and write ptrs?

The above seems wrong: we don't want to reorder a load of Write before
a load of Flush, otherwise it could look like Write < Flush.
(Similarly, we never want to reorder a store of Flush before a store of
Write.) So I think we should have another read barrier between those
two atomic reads.

I also think we need the read barrier at the beginning because the
caller doesn't expect a stale value of the Flush pointer. It might not
be strictly required for correctness, because I don't think that stale
value can be inconsistent with anything else, but getting an up-to-date
value is better.

> If at all, a read
> barrier is warranted here, we can use atomic read with full barrier

I don't think we need a full barrier but I'm fine with using
pg_atomic_read_membarrier_u64() if it's better for whatever reason.

> +    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Write, EndOfLog);
> +    pg_atomic_write_u64(&XLogCtl->LogwrtResult.Flush, EndOfLog);
>      XLogCtl->LogwrtRqst.Write = EndOfLog;
>
> pg_atomic_write_u64 here seems fine to me as no other process is
> active writing WAL yet, otherwise, we need write with full barrier

I don't see any memory ordering hazard. In any case, there's an
LWLockAcquire a few lines later, which acts as a full barrier.

> something like pg_write_barrier + pg_atomic_write_u64 or
> pg_atomic_exchange_u64 like in LWLockUpdateVar or use atomic write
> with full barrier sematices as proposed here (when that's in) -
> https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13
> .
>
> 5. I guess we'd better use pg_atomic_read_u64_impl and
> pg_atomic_compare_exchange_u64_impl in
> pg_atomic_monotonic_advance_u64
> to reduce one level of function indirections.

Aren't they inlined?

> 6.
> + * Full barrier semantics (even when value is unchanged).
>
> +    currval = pg_atomic_read_u64(ptr);
> +    if (currval >= target_)
> +    {
> +        pg_memory_barrier();
>
> Perhaps, we can use atomic read with full barrier sematices proposed
> here -
> https://www.postgresql.org/message-id/20231127210030.GA140335%40nathanxps13

I don't think they are exactly equivalent: in the current patch, the
first pg_atomic_read_u64() could be reordered with earlier reads;
whereas that wouldn't work if using pg_atomic_read_membarrier_u64() it
could not be. I'm not sure whether that could create a performance
problem or not.

> Just for the sake of completeness, do we need
> pg_atomic_monotonic_advance_u32 as well?

+1.

> 8. I'd split the addition of these new monotonic functions into 0001,
> then 0002 adding XLogwrtAtomic and 0003 adding Copy to XLogwrtAtomic.

+1

> 9.
> +    copyptr = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Copy);
> +    if (startptr + count > copyptr)
> +        ereport(WARNING,
> +                (errmsg("request to read past end of generated WAL;
> request %X/%X, current position %X/%X",
> +                        LSN_FORMAT_ARGS(startptr + count),
> LSN_FORMAT_ARGS(copyptr))));
>
> Any specific reason for this to be a WARNING rather than an ERROR?

Good question. WaitXLogInsertionsToFinish() uses a LOG level message
for the same situation. They should probably be the same log level, and
I would think it would be either PANIC or WARNING. I have no idea why
LOG was chosen.

> I've addressed some of my review comments (#1, #5, #7 and #8) above,
> merged the v11j-0002-fixup.patch into 0002, ran pgindent. Please see
> the attached v12 patch set. FWIW, CF bot is happy with these patches
> and also tests with --enable-atomics=no are clean.
>
> BTW, I couldn't find a CF entry for this, so I've registered one -
> https://commitfest.postgresql.org/47/4846/.

Thank you.

0001:

* The comments on the two versions of the functions are redundant, and
the style in that header seems to be to omit the comment from the u64
version. 

* I'm not sure the AssertPointerAlignment is needed in the u32 version?

0002:

* All updates to the non-shared LogwrtResult should update both values.
It's confusing to update those local values independently, because it
violates the invariant that LogwrtResult.Flush <= LogwrtResult.Write.

* pg_memory_barrier() is not needed right before a spinlock

* As mentioned above, I think GetFlushRecPtr() needs two read barriers.
Also, I think the same for GetXLogWriteRecPtr().

* In general, for any place using both Write and Flush, I think Flush
should be loaded first, followed by a read barrier, followed by a load
of the Write pointer. And I think in most of those places there should
be a read barrier before the load of Flush, too, to avoid a stale value
in places that might matter.

0003:

* We need to maintain the invariant that Copy >= Write >= Flush. I
believe that's always satisfied, because the
XLogWaitInsertionsToFinish() is always called before XLogWrite(). But
we should add an assert or runtime check of this invariant somewhere.

Alvaro, do you intend to review and/or commit this work eventually? If
you've set it down, I can take it. There are still a few details so I'm
not sure it's ready for commit quite yet, but it's getting closer.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-02-21 20:26:51 Re: WIP Incremental JSON Parser
Previous Message Heikki Linnakangas 2024-02-21 19:19:49 Re: DSA_ALLOC_NO_OOM doesn't work