corruption of WAL page header is never reported

From: Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: corruption of WAL page header is never reported
Date: 2021-07-17 19:55:05
Message-ID: 20210718045505.32f463ed6c227111038d8ae4@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I found that any corruption of WAL page header found during recovery is never
reported in log messages. If wal page header is broken, it is detected in
XLogReaderValidatePageHeader called from XLogPageRead, but the error messages
are always reset and never reported.

if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
{
/* reset any error XLogReaderValidatePageHeader() might have set */
xlogreader->errormsg_buf[0] = '\0';
goto next_record_is_invalid;
}

Since the commit 06687198018, we call XLogReaderValidatePageHeader here so that
we can check a page header and retry immediately if it's invalid, but the error
message is reset immediately and not reported. I guess the reason why the error
message is reset is because we might get the right WAL after some retries.
However, I think it is better to report the error for each check in order to let
users know the actual issues founded in the WAL.

I attached a patch to fix it in this way.

Or, if we wouldn't like to report an error for each check and also what we want
to check here is just about old recycled WAL instead of header corruption itself,
I wander that we could check just xlp_pageaddr instead of calling
XLogReaderValidatePageHeader.

Regards,
Yugo Nagata

--
Yugo NAGATA <nagata(at)sraoss(dot)co(dot)jp>

Attachment Content-Type Size
report_corrupt_page_header.patch text/x-diff 783 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-07-17 20:35:07 Re: slab allocator performance issues
Previous Message Andres Freund 2021-07-17 19:53:07 Re: slab allocator performance issues