Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-10-05 10:41:58
Message-ID: CAAJ_b9549-u2qo4hgGcE94vcv6sTHUO_Eq+5vB1_-Av5QokJLw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 4, 2021 at 1:57 PM Rushabh Lathia
<rushabh(dot)lathia(at)gmail(dot)com> wrote:
>
>
>
> On Fri, Oct 1, 2021 at 2:29 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> On Thu, Sep 30, 2021 at 7:59 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>> > To find the value of InRecovery after we clear it, patch still uses
>> > ControlFile's DBState, but now the check condition changed to a more
>> > specific one which is less confusing.
>> >
>> > In casual off-list discussion, the point was made to check
>> > SharedRecoveryState to find out the InRecovery value afterward, and
>> > check that using RecoveryInProgress(). But we can't depend on
>> > SharedRecoveryState because at the start it gets initialized to
>> > RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
>> > Therefore, we can't use RecoveryInProgress() which always returns
>> > true if SharedRecoveryState != RECOVERY_STATE_DONE.
>>
>> Uh, this change has crept into 0002, but it should be in 0004 with the
>> rest of the changes to remove dependencies on variables specific to
>> the startup process. Like I said before, we should really be trying to
>> separate code movement from functional changes.

Well, I have to replace the InRecovery flag in that patch since we are
moving code past to the point where the InRecovery flag gets cleared.
If I don't do, then the 0002 patch would be wrong since InRecovery is
always false, and behaviour won't be the same as it was before that
patch.

>> Also, 0002 doesn't
>> actually apply for me. Did you generate these patches with 'git
>> format-patch'?
>>
>> [rhaas pgsql]$ patch -p1 <
>> ~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
>> patching file src/backend/access/transam/xlog.c
>> Hunk #1 succeeded at 889 (offset 9 lines).
>> Hunk #2 succeeded at 939 (offset 12 lines).
>> Hunk #3 succeeded at 5734 (offset 37 lines).
>> Hunk #4 succeeded at 8038 (offset 70 lines).
>> Hunk #5 succeeded at 8248 (offset 70 lines).
>> [rhaas pgsql]$ patch -p1 <
>> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
>> patching file src/backend/access/transam/xlog.c
>> Reversed (or previously applied) patch detected! Assume -R? [n]
>> Apply anyway? [n] y
>> Hunk #1 FAILED at 7954.
>> Hunk #2 succeeded at 8079 (offset 70 lines).
>> 1 out of 2 hunks FAILED -- saving rejects to file
>> src/backend/access/transam/xlog.c.rej
>> [rhaas pgsql]$ git reset --hard
>> HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a
>> non-recoverable connection failure.
>> [rhaas pgsql]$ patch -p1 <
>> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
>> patching file src/backend/access/transam/xlog.c
>> Reversed (or previously applied) patch detected! Assume -R? [n]
>> Apply anyway? [n]
>> Skipping patch.
>> 2 out of 2 hunks ignored -- saving rejects to file
>> src/backend/access/transam/xlog.c.rej
>>
>
> I tried to apply the patch on the master branch head and it's failing
> with conflicts.
>

Thanks, Rushabh, for the quick check, I have attached a rebased version for the
latest master head commit # f6b5d05ba9a.

> Later applied patch on below commit and it got applied cleanly:
>
> commit 7d1aa6bf1c27bf7438179db446f7d1e72ae093d0
> Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Date: Mon Sep 27 18:48:01 2021 -0400
>
> Re-enable contrib/bloom's TAP tests.
>
> rushabh(at)rushabh:postgresql$ git apply v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
> rushabh(at)rushabh:postgresql$ git apply v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
> rushabh(at)rushabh:postgresql$ git apply v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:34: space before tab in indent.
> /*
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:38: space before tab in indent.
> */
> v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:39: space before tab in indent.
> Insert->fullPageWrites = lastFullPageWrites;
> warning: 3 lines add whitespace errors.
> rushabh(at)rushabh:postgresql$ git apply v36-0004-Remove-dependencies-on-startup-process-specifica.patch
>
> There are whitespace errors on patch 0003.
>

Fixed.

>>
>> It seems to me that the approach you're pursuing here can't work,
>> because the long-term goal is to get to a place where, if the system
>> starts up read-only, XLogAcceptWrites() might not be called until
>> later, after StartupXLOG() has exited. But in that case the control
>> file state would be DB_IN_PRODUCTION. But my idea of using
>> RecoveryInProgress() won't work either, because we set
>> RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put
>> differently, the question we want to answer is not "are we in recovery
>> now?" but "did we perform recovery?". After studying the code a bit, I
>> think a good test might be
>> !XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery
>> gets set to true, then we're certain to enter the if (InRecovery)
>> block that contains the main redo loop. And that block unconditionally
>> does XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr. I
>> think that replayEndRecPtr can't be 0 because it's supposed to
>> represent the record we're pretending to have last replayed, as
>> explained by the comments. And while lastReplayedEndRecPtr will get
>> updated later as we replay more records, I think it will never be set
>> back to 0. It's only going to increase, as we replay more records. On
>> the other hand if InRecovery = false then we'll never change it, and
>> it seems that it starts out as 0.
>>

Understood, used lastReplayedEndRecPtr but in 0002 patch for the
aforesaid reason.

>> I was hoping to have more time today to comment on 0004, but the day
>> seems to have gotten away from me. One quick thought is that it looks
>> a bit strange to be getting EndOfLog from GetLastSegSwitchData() which
>> returns lastSegSwitchLSN while getting EndOfLogTLI from replayEndTLI
>> ... because there's also replayEndRecPtr, which seems to go with
>> replayEndTLI. It feels like we should use a source for the TLI that
>> clearly matches the source for the corresponding LSN, unless there's
>> some super-good reason to do otherwise.

Agreed, that would be the right thing, but on the latest master head
that might not be the right thing to use because of commit #
ff9f111bce24 that has introduced the following code that changes the
EndOfLog that could be different from replayEndRecPtr:

/*
* Actually, if WAL ended in an incomplete record, skip the parts that
* made it through and start writing after the portion that persisted.
* (It's critical to first write an OVERWRITE_CONTRECORD message, which
* we'll do as soon as we're open for writing new WAL.)
*/
if (!XLogRecPtrIsInvalid(missingContrecPtr))
{
Assert(!XLogRecPtrIsInvalid(abortedRecPtr));
EndOfLog = missingContrecPtr;
}

With this commit, we have got two new global variables. First,
missingContrecPtr is an EndOfLog which gets stored in shared memory at
few places, and the other one abortedRecPtr that is needed in
XLogAcceptWrite(), which I have exported into shared memory.

Regards,
Amul

Attachment Content-Type Size
v37-0004-Remove-dependencies-on-startup-process-specifica.patch application/x-patch 8.2 KB
v37-0002-Postpone-some-end-of-recovery-operations-relatin.patch application/x-patch 4.3 KB
v37-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch application/x-patch 12.5 KB
v37-0003-Create-XLogAcceptWrites-function-with-code-from-.patch application/x-patch 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2021-10-05 10:50:12 Re: Next Steps with Hash Indexes
Previous Message Simon Riggs 2021-10-05 10:38:02 Re: Next Steps with Hash Indexes