Re: LogwrtResult contended spinlock

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
Subject: Re: LogwrtResult contended spinlock
Date: 2024-03-14 07:38:23
Message-ID: CALj2ACUZAQfTfHwB8BN0Jm2uAw+ij238qn_TRM3-4OCAGs8OYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 4, 2024 at 9:15 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> > 0003:
> >
> > * We need to maintain the invariant that Copy >= Write >= Flush. I
> > believe that's always satisfied, because the
> > XLogWaitInsertionsToFinish() is always called before XLogWrite(). But
> > we should add an assert or runtime check of this invariant somewhere.
>
> Yes, that invariant is already maintained by the server. Although, I'm
> not fully agree, I added an assertion to WaitXLogInsertionsToFinish
> after updating XLogCtl->LogwrtResult.Copy. CF bot is happy with it -
> https://github.com/BRupireddy2/postgres/tree/atomic_LogwrtResult_v13.

I've now separated these invariants out into the 0004 patch.

With the assertions placed in WaitXLogInsertionsToFinish after
updating Copy ptr, I observed the assertion failing in one of the CF
bot machines - https://cirrus-ci.com/build/6202112288227328. I could
reproduce it locally with [1]. I guess the reason is that the Write
and Flush ptrs are now updated independently and atomically without
lock, they might drift and become out-of-order for a while if
concurrently they are accessed in WaitXLogInsertionsToFinish. So, I
guess the right place to verify the invariant Copy >= Write >= Flush
is in XLogWrite once Write and Flush ptrs in shared memory are updated
(note that only one process at a time can do this). Accordingly, I've
moved the assertions to XLogWrite in the attached v14-0004 patch.

> Please see the attached v13 patch set for further review.

Earlier versions of the patches removed a piece of code ensuring
shared WAL 'request' values did not fall beading the 'result' values.
There's a good reason for us to have it. So, I restored it.

- /*
- * Update shared-memory status
- *
- * We make sure that the shared 'request' values do not fall behind the
- * 'result' values. This is not absolutely essential, but it saves some
- * code in a couple of places.
- */

Please see the attached v14 patch set.

[1] for i in {1..100}; do make check
PROVE_TESTS="t/027_stream_regress.pl"; if [ $? -ne 0 ]; then echo "The
command failed on iteration $i"; break; fi; done

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v14-0001-Add-monotonic-advancement-functions-for-atomics.patch application/octet-stream 2.4 KB
v14-0002-Make-XLogCtl-LogwrtResult-accessible-with-atomic.patch application/octet-stream 10.5 KB
v14-0003-Add-Copy-pointer-to-track-data-copied-to-WAL-buf.patch application/octet-stream 3.3 KB
v14-0004-Add-invariants-for-shared-LogwrtResult-members.patch application/octet-stream 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-03-14 07:42:10 Re: type cache cleanup improvements
Previous Message Anton A. Melnikov 2024-03-14 07:08:50 Re: May be BUG. Periodic burst growth of the checkpoint_req counter on replica.