Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: sfrost(at)snowman(dot)net, zeleny(dot)ales(at)gmail(dot)com, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Wall shiping replica failed to recover database with error: invalid contrecord length 1956 at FED/38FFE208
Date: 2020-01-17 08:26:55
Message-ID: 20200117.172655.1384889922565817808.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thank you, and sorry for overlooking your comment.

At Thu, 14 Nov 2019 12:28:13 -0500, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > At Wed, 2 Oct 2019 19:24:02 -0400, Stephen Frost <sfrost(at)snowman(dot)net> wrote in
> >> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> >>> Yeah, those messages are all pretty ancient, from when WAL was new and not
> >>> to be trusted much. Perhaps the thing to do is move the existing info
> >>> into DETAIL and make the primary message be something like "reached
> >>> apparent end of WAL stream".
>
> >> Yes, +1 on that.
>
> > What do you think about this?
>
> It seems overcomplicated. Why do you need to reach into
> emode_for_corrupt_record's private state?
> emode_for_corrupt_record's private state? I think we could just
> change the message text without that, and leave the emode
> machinations as-is.

I tried to avoid messages at the same LSN. But it is done by
emode_for_corrupt_record by reducing error level. I reverted that
part.

> I don't understand the restriction to "if (RecPtr == InvalidXLogRecPtr)"
> either? Maybe that's fine but the comment fails to explain it.

"When we end up here while reading successive recored" meant that, but
it is not explicit. If RecPtr was not invalid, it means that the
caller is requesting for a specific record that should exist. It is
not end-of-wal at all. I rewrote the comment.

> Another point is that, as the comment for emode_for_corrupt_record
> notes, we don't really want to consider that we've hit end-of-WAL
> when reading any source except XLOG_FROM_PG_WAL. So I think the
> change of message ought to include a test of the source, but this
> doesn't.

Maybe no. The function just mutes messages for repeated error on
XLOG_FROM_WAL. We consider end-of-WAL while XLOG_FROM_ARCHIVE. In
this version the "reached end of WAL" is not emitted when
emode_for_corrupt_record decided not to show the message.

> Also, the business with an "operation" string violates the message
> translatability guideline about "don't assemble a message out of
> phrases". If we want to have that extra detail, it's better just
> to make three separate ereport() calls with separately translatable
> messages.

Though I thought that the operation is just a noun and it is safely
processed separately, I followed your comment. And this version takes
the more verbose one of the previous two.

> Also, it seems wrong that the page TLI check, just below, is where
> it is and isn't part of the main set of page header sanity checks.
> That's sort of unrelated to this patch, except not really, because
> shouldn't a failure of that test also be treated as an "end of WAL"
> condition?

It seems checking if xlogreader did something wrong, since it is
pluggable. I'm not sure there is any concrete reason for that,
though. As for recovery, it is actually redundant because
XLogFileReadAnyTLI already checked that for the record, but it doesn't
matter.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
0001-Change-end-of-WAL-message-less-scary.patch text/x-patch 3.3 KB

In response to

Browse pgsql-general by date

  From Date Subject
Next Message ramesh penumalli 2020-01-17 08:47:08 postgresql commands(psql,createdb,dropdb) are not working from shell script
Previous Message Arnaud L. 2020-01-17 07:43:44 Re: minimal wal_level on subscriber