Re: Make mesage at end-of-recovery less scary.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: ashu(dot)coek88(at)gmail(dot)com
Cc: pashkin(dot)elfe(at)gmail(dot)com, michael(at)paquier(dot)xyz, bossartn(at)amazon(dot)com, david(at)pgmasters(dot)net, peter(dot)eisentraut(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, pgsql-hackers(at)lists(dot)postgresql(dot)org, jtc331(at)gmail(dot)com, robertmhaas(at)gmail(dot)com
Subject: Re: Make mesage at end-of-recovery less scary.
Date: 2022-02-10 06:17:36
Message-ID: 20220210.151736.42160925922013414.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Wed, 9 Feb 2022 17:31:02 +0530, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com> wrote in
> On Wed, Feb 9, 2022 at 1:14 PM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > This means archive-recovery is requested but not started yet. That is,
> > we've just finished crash recovery. The existing comment cited
> > together is mentioning that.
> >
> > At the end of PITR (or archive recovery), the other code works.
> >
>
> This is quite understandable, the point here is that the message that
> we are emitting says, we have just finished reading the wal files in
> the pg_wal directory during crash recovery and are now entering
> archive recovery when we are actually doing point-in-time recovery
> which seems a bit misleading.

Here is the messages.

> 2022-02-08 18:00:44.367 IST [86185] LOG: starting point-in-time
> recovery to WAL location (LSN) "0/5227790"
> 2022-02-08 18:00:44.368 IST [86185] LOG: database system was not
> properly shut down; automatic recovery in progress
> 2022-02-08 18:00:44.369 IST [86185] LOG: redo starts at 0/14DC8D8
> 2022-02-08 18:00:44.978 IST [86185] DEBUG1: reached end of WAL at
> 0/3FFFFD0 on timeline 1 in pg_wal during crash recovery, entering
> archive recovery

In the first place the last DEBUG1 is not on my part, but one of the
messages added by this patch says the same thing. Is your point that
archive recovery is different thing from PITR? In regard to the
difference, I think PITR is a form of archive recovery.

That being said, after some thoughts on this, I changed my mind that
we don't need to say what operation was being performed at the
end-of-WAL. So in the attached the end-of-WAL message is not
accompanied by the kind of recovery.

> LOG: reached end of WAL at 0/3000000 on timeline 1

I removed the archive-source part along with the operation mode.
Because it make the message untranslatable. It is now very simple but
seems enough.

While working on this, I noticed that we need to set EndOfWAL when
WaitForWALToBecomeAvailable returned with failure. That means the
file does not exist at all so it is a kind of end-of-WAL. In that
sense the following existing comment in ReadRecord is a bit wrong.

> * We only end up here without a message when XLogPageRead()
> * failed - in that case we already logged something. In
> * StandbyMode that only happens if we have been triggered, so we
> * shouldn't loop anymore in that case.

Actually there's a case we get there without a message and without
logged something when a segment file is not found unless we're in
standby mode.

> > Well. I guess that the "automatic recovery" is ambiguous. Does it
> > make sense if the second line were like the follows instead?
> >
> > + 2022-02-08 18:00:44.368 IST [86185] LOG: database system was not properly shut down; crash recovery in progress
> >
>
> Well, according to me the current message looks fine.

Good to hear. (In the previos version I modified the message by accident..)

> $chkptfile is declared twice in the same scope. We can probably remove
> the first one.

Ugh.. Fixed. (I wonder why Perl doesn't complain on this..)

In this version 12 I made the following changes.

- Rewrote (halfly reverted) a comment in ReadRecord

- Simplified the "reached end of WAL" message by removing recovery
mode and WAL source in ReadRecord.

- XLogPageRead sets EndOfWAL flag in the ENOENT case.

- Removed redundant declaration of the same variable in TAP script.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v12-0001-Make-End-Of-Recovery-error-less-scary.patch text/x-patch 16.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-02-10 06:26:44 Re: Replacing TAP test planning with done_testing()
Previous Message Andres Freund 2022-02-10 05:17:33 Re: Move replication slot structures/enums/macros to a new header file for better usability by external tools/modules