Re: Make mesage at end-of-recovery less scary.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: bossartn(at)amazon(dot)com
Cc: david(at)pgmasters(dot)net, peter(dot)eisentraut(at)2ndquadrant(dot)com, andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, pgsql-hackers(at)lists(dot)postgresql(dot)org, jtc331(at)gmail(dot)com, robertmhaas(at)gmail(dot)com
Subject: Re: Make mesage at end-of-recovery less scary.
Date: 2021-11-08 05:59:46
Message-ID: 20211108.145946.1513355777186578917.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 22 Oct 2021 17:54:40 +0000, "Bossart, Nathan" <bossartn(at)amazon(dot)com> wrote in
> On 3/4/21, 10:50 PM, "Kyotaro Horiguchi" <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > As the result, the following messages are emitted with the attached.
>
> I'd like to voice my support for this effort, and I intend to help
> review the patch. It looks like the latest patch no longer applies,
> so I've marked the commitfest entry [0] as waiting-on-author.
>
> Nathan
>
> [0] https://commitfest.postgresql.org/35/2490/

Sorry for being late to reply. I rebased this to the current master.

- rebased

- use LSN_FORMAT_ARGS instead of bare shift and mask.

- v4 immediately exited walreceiver on disconnection. Maybe I wanted
not to see a FATAL message on standby after primary dies. However
that would be another issue and that change was plain wrong.. v5
just removes the "end-of-WAL" part from the message, which duplicate
to what startup emits.

- add a new error message "missing contrecord at %X/%X". Maybe this
should be regarded as a leftover of the contrecord patch. In the
attached patch the "%X/%X" is the LSN of the current record. The log
messages look like this (026_overwrite_contrecord).

LOG: redo starts at 0/1486CB8
WARNING: missing contrecord at 0/1FFC2E0
LOG: consistent recovery state reached at 0/1FFC2E0
LOG: started streaming WAL from primary at 0/2000000 on timeline 1
LOG: successfully skipped missing contrecord at 0/1FFC2E0, overwritten at 2021-11-08 14:50:11.969952+09
CONTEXT: WAL redo at 0/2000028 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFC2E0; time 2021-11-08 14:50:11.969952+09

While checking the behavior for the case of missing-contrecord, I
noticed that emode_for_corrupt_record() doesn't work as expected since
readSource is reset to XLOG_FROM_ANY after a read failure. We could
remember the last failed source but pg_wal should have been visited if
page read error happened so I changed the function so that it treats
XLOG_FROM_ANY the same way with XLOG_FROM_PG_WAL.

(Otherwise we see "LOG: reached end-of-WAL at .." message after
"WARNING: missing contrecord at.." message.)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v5-0001-Make-End-Of-Recovery-error-less-scary.patch text/x-patch 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2021-11-08 06:06:54 Re: Question about psql meta-command with schema option doesn't use visibilityrule
Previous Message Michael Paquier 2021-11-08 05:43:43 Re: Commitfest 2021-11 Patch Triage - Part 1