Re: [Patch] ALTER SYSTEM READ ONLY

From: Amul Sul <sulamul(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com>
Subject: Re: [Patch] ALTER SYSTEM READ ONLY
Date: 2021-07-28 11:32:33
Message-ID: CAAJ_b95gaLs+poY4iwnYf9n_1RpxNrycOOUQHf+zoxvOu0BUNw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 28, 2021 at 4:37 PM Amul Sul <sulamul(at)gmail(dot)com> wrote:
>
> On Wed, Jul 28, 2021 at 2:26 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >
> > On Fri, Jul 23, 2021 at 4:03 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > My 0003 is where I see some lingering problems. It creates
> > > XLogAcceptWrites(), moves the appropriate stuff there, and doesn't
> > > need the xlogreader. But it doesn't really solve the problem of how
> > > checkpointer.c would be able to call this function with proper
> > > arguments. It is at least better in not needing two arguments to
> > > decide what to do, but how is checkpointer.c supposed to know what to
> > > pass for xlogaction? Worse yet, how is checkpointer.c supposed to know
> > > what to pass for EndOfLogTLI and EndOfLog?
> >
> > On further study, I found another problem: the way my patch set leaves
> > things, XLogAcceptWrites() depends on ArchiveRecoveryRequested, which
> > will not be correctly initialized in any process other than the
> > startup process. So CleanupAfterArchiveRecovery(EndOfLogTLI, EndOfLog)
> > would just be skipped. Your 0001 seems to have the same problem. You
> > added Assert(AmStartupProcess()) to the inside of the if
> > (ArchiveRecoveryRequested) block, but that doesn't fix anything.
> > Outside the startup process, ArchiveRecoveryRequested will always be
> > false, but the point is that the associated stuff should be done if
> > ArchiveRecoveryRequested would have been true in the startup process.
> > Both of our patch sets leave things in a state where that would never
> > happen, which is not good. Unless I'm missing something, it seems like
> > maybe you didn't test your patches to verify that, when the
> > XLogAcceptWrites() call comes from the checkpointer, all the same
> > things happen that would have happened had it been called from the
> > startup process. That would be a really good thing to have tested
> > before posting your patches.
> >
>
> My bad, I am extremely sorry about that. I usually do test my patches,
> but somehow I failed to test this change due to manually testing the
> whole ASRO feature and hurrying in posting the newest version.
>
> I will try to be more careful next time.
>

I was too worried about how I could miss that & after thinking more
about that, I realized that the operation for ArchiveRecoveryRequested
is never going to be skipped in the startup process and that never
left for the checkpoint process to do that later. That is the reason
that assert was added there.

When ArchiveRecoveryRequested, the server will no longer be in
the wal prohibited mode, we implicitly change the state to
wal-permitted. Here is the snip from the 0003 patch:

@@ -6614,13 +6629,30 @@ StartupXLOG(void)
(errmsg("starting archive recovery")));
}

- /*
- * Take ownership of the wakeup latch if we're going to sleep during
- * recovery.
- */
if (ArchiveRecoveryRequested)
+ {
+ /*
+ * Take ownership of the wakeup latch if we're going to sleep during
+ * recovery.
+ */
OwnLatch(&XLogCtl->recoveryWakeupLatch);

+ /*
+ * Since archive recovery is requested, we cannot be in a wal prohibited
+ * state.
+ */
+ if (ControlFile->wal_prohibited)
+ {
+ /* No need to hold ControlFileLock yet, we aren't up far enough */
+ ControlFile->wal_prohibited = false;
+ ControlFile->time = (pg_time_t) time(NULL);
+ UpdateControlFile();
+
+ ereport(LOG,
+ (errmsg("clearing WAL prohibition because the system is in archive
recovery")));
+ }
+ }
+

> > As far as EndOfLogTLI is concerned, there are, somewhat annoyingly,
> > several TLIs stored in XLogCtl. None of them seem to be precisely the
> > same thing as EndLogTLI, but I am hoping that replayEndTLI is close
> > enough. I found out pretty quickly through testing that replayEndTLI
> > isn't always valid -- it ends up 0 if we don't enter recovery. That's
> > not really a problem, though, because we only need it to be valid if
> > ArchiveRecoveryRequested. The code that initializes and updates it
> > seems to run whenever InRecovery = true, and ArchiveRecoveryRequested
> > = true will force InRecovery = true. So it looks to me like
> > replayEndTLI will always be initialized in the cases where we need a
> > value. It's not yet entirely clear to me if it has to have the same
> > value as EndOfLogTLI. I find this code comment quite mysterious:
> >
> > /*
> > * EndOfLogTLI is the TLI in the filename of the XLOG segment containing
> > * the end-of-log. It could be different from the timeline that EndOfLog
> > * nominally belongs to, if there was a timeline switch in that segment,
> > * and we were reading the old WAL from a segment belonging to a higher
> > * timeline.
> > */
> > EndOfLogTLI = xlogreader->seg.ws_tli;
> >
> > The thing is, if we were reading old WAL from a segment belonging to a
> > higher timeline, wouldn't we have switched to that new timeline?
>
> AFAIUC, by browsing the code, yes, we are switching to the new
> timeline. Along with lastReplayedTLI, lastReplayedEndRecPtr is also
> the same as the EndOfLog that we needed when ArchiveRecoveryRequested
> is true.
>
> I went through the original commit 7cbee7c0a1db and the thread[1] but
> didn't find any related discussion for that.
>
> > Suppose we want WAL segment 246 from TLI 1, but we don't have that
> > segment on TLI 1, only TLI 2. Well, as far as I know, for us to use
> > the TLI 2 version, we'd need to have TLI 2 in the history of the
> > recovery_target_timeline. And if that is the case, then we would have
> > to replay through the record where the timeline changes. And if we do
> > that, then the discrepancy postulated by the comment cannot still
> > exist by the time we reach this code, because this code is only
> > reached after we finish WAL redo. So I'm baffled as to how this can
> > happen, but considering how many cases there are in this code, I sure
> > can't promise that it doesn't. The fact that we have few tests for any
> > of this doesn't help either.
>
> I am not an expert in this area, but will try to spend some more time
> on understanding and testing.
>
> 1] postgr.es/m/555DD101.7080209@iki.fi
>
> Regards,
> Amul

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2021-07-28 11:44:00 Re: RFC: Logging plan of the running query
Previous Message Daniel Gustafsson 2021-07-28 11:26:23 Re: perlcritic: prohibit map and grep in void conext