Some error messages are omitted while recovery.

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Some error messages are omitted while recovery.
Date: 2020-12-14 09:04:37
Message-ID: 20201214.180437.1423912579041110544.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 14 Dec 2020 16:48:05 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> On Mon, Dec 14, 2020 at 11:34:51AM +0900, Kyotaro Horiguchi wrote:
> > Apart from this issue, while checking that, I noticed that if server
> > starts having WALs from a server of a different systemid, the server
> > stops with obscure messages.
>
> Wouldn't it be better to discuss that on a separate thread? I have
> mostly missed your message here.

Right. Here is the duplicate of the message. Thanks for the
suggestion!

=====
While in another discussion related to xlogreader[2], I noticed that
if server starts having WALs from a server of a different systemid,
the server stops with obscure messages.

> LOG: database system was shut down at 2020-12-14 10:36:02 JST
> LOG: invalid primary checkpoint record
> PANIC: could not locate a valid checkpoint record

The cause is XLogPageRead erases the error message set by
XLogReaderValidatePageHeader(). As the comment just above says, this
is required to continue replication under a certain situation. The
code is aiming to allow continue replication when the first half of a
continued record has been removed on the primary so we don't need to
do the amendment unless we're in standby mode. If we let the savior
code only while StandbyMode, we would have the correct error message.

> JST LOG: database system was shut down at 2020-12-14 10:36:02 JST
> LOG: WAL file is from different database system: WAL file database system identifier is 6905923817995618754, pg_control database system identifier is 6905924227171453468
> JST LOG: invalid primary checkpoint record
> JST PANIC: could not locate a valid checkpoint record

I confirmed 0668719801 still works under the intended context using
the steps shown in [1].

[1]: https://www.postgresql.org/message-id/flat/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com

[2]: https://www.postgresql.org/message-id/flat/2B4510B2-3D70-4990-BFE3-0FE64041C08A%40amazon.com

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Don-t-cancel-invalid-page-header-error-in-unwanted-s.patch text/x-patch 1.3 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey V. Lepikhov 2020-12-14 09:06:12 Re: [POC] Fast COPY FROM command for the table with foreign partitions
Previous Message Kyotaro Horiguchi 2020-12-14 08:56:23 Re: Asynchronous Append on postgres_fdw nodes.