Re: Make mesage at end-of-recovery less scary.

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: pashkin(dot)elfe(at)gmail(dot)com, Michael Paquier <michael(at)paquier(dot)xyz>, "Bossart, Nathan" <bossartn(at)amazon(dot)com>, David Steele <david(at)pgmasters(dot)net>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, jtc331(at)gmail(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Make mesage at end-of-recovery less scary.
Date: 2022-02-14 14:44:11
Message-ID: CAE9k0P=rs4+yAc4=+9Nmm+R_BGFj50wQObhYH2OSC9v7WNbVJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Feb 10, 2022 at 11:47 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Wed, 9 Feb 2022 17:31:02 +0530, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote in
> > On Wed, Feb 9, 2022 at 1:14 PM Kyotaro Horiguchi
> > <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > > This means archive-recovery is requested but not started yet. That is,
> > > we've just finished crash recovery. The existing comment cited
> > > together is mentioning that.
> > >
> > > At the end of PITR (or archive recovery), the other code works.
> > >
> >
> > This is quite understandable, the point here is that the message that
> > we are emitting says, we have just finished reading the wal files in
> > the pg_wal directory during crash recovery and are now entering
> > archive recovery when we are actually doing point-in-time recovery
> > which seems a bit misleading.
>
> Here is the messages.
>
> > 2022-02-08 18:00:44.367 IST [86185] LOG: starting point-in-time
> > recovery to WAL location (LSN) "0/5227790"
> > 2022-02-08 18:00:44.368 IST [86185] LOG: database system was not
> > properly shut down; automatic recovery in progress
> > 2022-02-08 18:00:44.369 IST [86185] LOG: redo starts at 0/14DC8D8
> > 2022-02-08 18:00:44.978 IST [86185] DEBUG1: reached end of WAL at
> > 0/3FFFFD0 on timeline 1 in pg_wal during crash recovery, entering
> > archive recovery
>
> In the first place the last DEBUG1 is not on my part, but one of the
> messages added by this patch says the same thing. Is your point that
> archive recovery is different thing from PITR? In regard to the
> difference, I think PITR is a form of archive recovery.
>

No, I haven't tried to compare archive recovery to PITR or vice versa,
instead I was trying to compare crash recovery with PITR. The message
you're emitting says just before entering into the archive recovery is
- "reached end-of-WAL on ... in pg_wal *during crash recovery*,
entering archive recovery". This message is static and can be emitted
not only during crash recovery, but also during PITR. I think we can
remove the "during crash recovery" part from this message, so "reached
the end of WAL at %X/%X on timeline %u in %s, entering archive
recovery". Also I don't think we need format specifier %s here, it can
be hard-coded with pg_wal as in this case we can only enter archive
recovery after reading wal from pg_wal, so current WAL source has to
be pg_wal, isn't it?

> That being said, after some thoughts on this, I changed my mind that
> we don't need to say what operation was being performed at the
> end-of-WAL. So in the attached the end-of-WAL message is not
> accompanied by the kind of recovery.
>
> > LOG: reached end of WAL at 0/3000000 on timeline 1
>
> I removed the archive-source part along with the operation mode.
> Because it make the message untranslatable. It is now very simple but
> seems enough.
>
> While working on this, I noticed that we need to set EndOfWAL when
> WaitForWALToBecomeAvailable returned with failure. That means the
> file does not exist at all so it is a kind of end-of-WAL. In that
> sense the following existing comment in ReadRecord is a bit wrong.
>
> > * We only end up here without a message when XLogPageRead()
> > * failed - in that case we already logged something. In
> > * StandbyMode that only happens if we have been triggered, so we
> > * shouldn't loop anymore in that case.
>
> Actually there's a case we get there without a message and without
> logged something when a segment file is not found unless we're in
> standby mode.
>
> > > Well. I guess that the "automatic recovery" is ambiguous. Does it
> > > make sense if the second line were like the follows instead?
> > >
> > > + 2022-02-08 18:00:44.368 IST [86185] LOG: database system was not properly shut down; crash recovery in progress
> > >
> >
> > Well, according to me the current message looks fine.
>
> Good to hear. (In the previos version I modified the message by accident..)
>
> > $chkptfile is declared twice in the same scope. We can probably remove
> > the first one.
>
> Ugh.. Fixed. (I wonder why Perl doesn't complain on this..)
>
>
> In this version 12 I made the following changes.
>
> - Rewrote (halfly reverted) a comment in ReadRecord
>
> - Simplified the "reached end of WAL" message by removing recovery
> mode and WAL source in ReadRecord.
>
> - XLogPageRead sets EndOfWAL flag in the ENOENT case.
>
> - Removed redundant declaration of the same variable in TAP script.
>

Thanks for the changes. Please note that I am not able to apply the
latest patch on HEAD. Could you please rebase it on HEAD and share the
new version. Thank you.

--
With Regards,
Ashutosh Sharma.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-02-14 14:52:16 Re: sockaddr_un.sun_len vs. reality
Previous Message Tom Lane 2022-02-14 14:41:40 Re: pgsql: Add test case for an archive recovery corner case.