Re: corruption of WAL page header is never reported

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: nagata(at)sraoss(dot)co(dot)jp, ranier(dot)vf(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: corruption of WAL page header is never reported
Date: 2021-09-02 04:17:16
Message-ID: 20210902.131716.1067877126401208860.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Thu, 2 Sep 2021 12:19:25 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2021/07/19 18:52, Yugo NAGATA wrote:
> > Well, I think my first patch was not wrong. The difference with the
> > latest
> > patch is just whether we perform the additional check when we are not
> > in
> > standby mode or not. The behavior is basically the same although which
> > function
> > detects and prints the page-header error in cases of crash recovery is
> > different.
>
> Yes, so which patch do you think is better? I like your version
> because there seems no reason why XLogPageRead() should skip
> XLogReaderValidatePageHeader() when not in standby mode.

Did you read the comment just above?

xlog.c:12523
> * Check the page header immediately, so that we can retry immediately if
> * it's not valid. This may seem unnecessary, because XLogReadRecord()
> * validates the page header anyway, and would propagate the failure up to
> * ReadRecord(), which would retry. However, there's a corner case with
> * continuation records, if a record is split across two pages such that

So when not in standby mode, the same check is performed by xlogreader
which has the responsibility to validate the binary data read by
XLogPageRead. The page-header validation is a compromise to save a
specific case.

> Also I'm tempted to move ereport() and reset of errmsg_buf to
> under next_record_is_invalid as follows. That is, in standby mode
> whenever we find an invalid record and retry reading WAL page
> in XLogPageRead(), we report the error message and reset it.
> For now in XLogPageRead(), only XLogReaderValidatePageHeader()
> sets errmsg_buf, but in the future other code or function doing that
> may be added. For that case, the following change seems more elegant.
> Thought?

I don't think it is good choice to conflate read-failure and header
validation failure from the view of modularity.

> * shouldn't be a big deal from a performance point of view.
> */
> if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
> - {
> - /* reset any error XLogReaderValidatePageHeader() might have set */
> - xlogreader->errormsg_buf[0] = '\0';
> goto next_record_is_invalid;
> - }
> return readLen;
> @@ -12517,7 +12513,17 @@ next_record_is_invalid:
> /* In standby-mode, keep trying */
> if (StandbyMode)
> + {
> + if (xlogreader->errormsg_buf[0])
> + {
> + ereport(emode_for_corrupt_record(emode, EndRecPtr),
> + (errmsg_internal("%s", xlogreader->errormsg_buf)));
> +
> + /* reset any error XLogReaderValidatePageHeader() might have set */
> + xlogreader->errormsg_buf[0] = '\0';
> + }
> goto retry;
> + }
> else
> return -1;
> }

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-09-02 04:19:35 Re: row filtering for logical replication
Previous Message Amit Kapila 2021-09-02 03:43:17 Re: row filtering for logical replication