Re: [Patch] ALTER SYSTEM READ ONLY

From: Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-10-04 08:27:01
Message-ID: CAGPqQf2n0aeeKMv+mA0zpgkqkamEVMCZE_VL74hxCe9hSNOQNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 1, 2021 at 2:29 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Thu, Sep 30, 2021 at 7:59 AM Amul Sul <sulamul(at)gmail(dot)com> wrote:
> > To find the value of InRecovery after we clear it, patch still uses
> > ControlFile's DBState, but now the check condition changed to a more
> > specific one which is less confusing.
> >
> > In casual off-list discussion, the point was made to check
> > SharedRecoveryState to find out the InRecovery value afterward, and
> > check that using RecoveryInProgress(). But we can't depend on
> > SharedRecoveryState because at the start it gets initialized to
> > RECOVERY_STATE_CRASH irrespective of InRecovery that happens later.
> > Therefore, we can't use RecoveryInProgress() which always returns
> > true if SharedRecoveryState != RECOVERY_STATE_DONE.
>
> Uh, this change has crept into 0002, but it should be in 0004 with the
> rest of the changes to remove dependencies on variables specific to
> the startup process. Like I said before, we should really be trying to
> separate code movement from functional changes. Also, 0002 doesn't
> actually apply for me. Did you generate these patches with 'git
> format-patch'?
>
> [rhaas pgsql]$ patch -p1 <
> ~/Downloads/v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
> patching file src/backend/access/transam/xlog.c
> Hunk #1 succeeded at 889 (offset 9 lines).
> Hunk #2 succeeded at 939 (offset 12 lines).
> Hunk #3 succeeded at 5734 (offset 37 lines).
> Hunk #4 succeeded at 8038 (offset 70 lines).
> Hunk #5 succeeded at 8248 (offset 70 lines).
> [rhaas pgsql]$ patch -p1 <
> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
> patching file src/backend/access/transam/xlog.c
> Reversed (or previously applied) patch detected! Assume -R? [n]
> Apply anyway? [n] y
> Hunk #1 FAILED at 7954.
> Hunk #2 succeeded at 8079 (offset 70 lines).
> 1 out of 2 hunks FAILED -- saving rejects to file
> src/backend/access/transam/xlog.c.rej
> [rhaas pgsql]$ git reset --hard
> HEAD is now at b484ddf4d2 Treat ETIMEDOUT as indicating a
> non-recoverable connection failure.
> [rhaas pgsql]$ patch -p1 <
> ~/Downloads/v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
> patching file src/backend/access/transam/xlog.c
> Reversed (or previously applied) patch detected! Assume -R? [n]
> Apply anyway? [n]
> Skipping patch.
> 2 out of 2 hunks ignored -- saving rejects to file
> src/backend/access/transam/xlog.c.rej
>
>
I tried to apply the patch on the master branch head and it's failing
with conflicts.

Later applied patch on below commit and it got applied cleanly:

commit 7d1aa6bf1c27bf7438179db446f7d1e72ae093d0
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Mon Sep 27 18:48:01 2021 -0400

Re-enable contrib/bloom's TAP tests.

rushabh(at)rushabh:postgresql$ git apply
v36-0001-Refactor-some-end-of-recovery-code-out-of-Startu.patch
rushabh(at)rushabh:postgresql$ git apply
v36-0002-Postpone-some-end-of-recovery-operations-relatin.patch
rushabh(at)rushabh:postgresql$ git apply
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:34: space
before tab in indent.
/*
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:38: space
before tab in indent.
*/
v36-0003-Create-XLogAcceptWrites-function-with-code-from-.patch:39: space
before tab in indent.
Insert->fullPageWrites = lastFullPageWrites;
warning: 3 lines add whitespace errors.
rushabh(at)rushabh:postgresql$ git apply
v36-0004-Remove-dependencies-on-startup-process-specifica.patch

There are whitespace errors on patch 0003.

> It seems to me that the approach you're pursuing here can't work,
> because the long-term goal is to get to a place where, if the system
> starts up read-only, XLogAcceptWrites() might not be called until
> later, after StartupXLOG() has exited. But in that case the control
> file state would be DB_IN_PRODUCTION. But my idea of using
> RecoveryInProgress() won't work either, because we set
> RECOVERY_STATE_DONE just after we set DB_IN_PRODUCTION. Put
> differently, the question we want to answer is not "are we in recovery
> now?" but "did we perform recovery?". After studying the code a bit, I
> think a good test might be
> !XLogRecPtrIsInvalid(XLogCtl->lastReplayedEndRecPtr). If InRecovery
> gets set to true, then we're certain to enter the if (InRecovery)
> block that contains the main redo loop. And that block unconditionally
> does XLogCtl->lastReplayedEndRecPtr = XLogCtl->replayEndRecPtr. I
> think that replayEndRecPtr can't be 0 because it's supposed to
> represent the record we're pretending to have last replayed, as
> explained by the comments. And while lastReplayedEndRecPtr will get
> updated later as we replay more records, I think it will never be set
> back to 0. It's only going to increase, as we replay more records. On
> the other hand if InRecovery = false then we'll never change it, and
> it seems that it starts out as 0.
>
> I was hoping to have more time today to comment on 0004, but the day
> seems to have gotten away from me. One quick thought is that it looks
> a bit strange to be getting EndOfLog from GetLastSegSwitchData() which
> returns lastSegSwitchLSN while getting EndOfLogTLI from replayEndTLI
> ... because there's also replayEndRecPtr, which seems to go with
> replayEndTLI. It feels like we should use a source for the TLI that
> clearly matches the source for the corresponding LSN, unless there's
> some super-good reason to do otherwise.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com
>
>
>

--
Rushabh Lathia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-10-04 08:27:44 Re: Incorrect snapshots while promoting hot standby node when 2PC is used
Previous Message Bharath Rupireddy 2021-10-04 08:19:23 replace InvalidXid(a macro that doesn't exist) with InvalidTransactionId(a macro that exists) in code comments