Re: Should walsernder check correctness of WAL records?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Should walsernder check correctness of WAL records?
Date: 2020-10-02 12:42:46
Message-ID: 14e95078-5fde-c784-20c7-dda4bc399d37@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02.10.2020 3:28, Michael Paquier wrote:
> On Fri, Oct 02, 2020 at 12:16:25AM +0000, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
>> IIUC, walsender tries hard to send WAL as fast as possible to reduce
>> replication lag and transaction response time, so it doesn't try to
>> peek each WAL record. I think it's good.
> CRC calculation would unlikely be the bottleneck here, no? I would
> assume that the extra lseek() calls needed to look after the record
> data to be more harmful.
When do we need to perform some lseeks?
wal-sender and wal-receiver are dealing just with raw sequences of bytes.
Them do not try to split input stream into WAL records.
If we have to process input data using wal-reader, then I afraid it will
itself add quite noticeable overhead.
Using standard wal reader seems to be very inefficient in this case,
because it performs unpacking of WAL records.
We do not need it: the only requires thing is to extract WAL record
length from the header and calculate CRC.
The main difficulty is that WAl record can occupy several pages, so we
need to accumulate checksum somewhere
and  seek backward to the beginning of the record once we found  CRC
mismatch.

>> In any case, the WAL can get corrupt during transmission, and
>> writing and reading on the standby. So, the standby needs to check
>> the WAL record CRC.
> Yep. However, I would worry much more about the case of cold
> archives. In my experience, there are higher risks to get a WAL
> segment corrupted because it was on disk and that this disk got
> corrupted. Transmission is a one-time short operation. Cold archives
> could stay on disk for weeks before getting reused in WAL replay.
> --
> Michael

So right now neither wal-sender, neither wal-receiver are checking CRC.
We check records only when applying them.
But it seems to be too late for correct recovery.

As far as wal-sender adjust replication slot position according to the
flush position at replica,
at the moment when we detect corrupted record restart lsn position can
be already set after this  record.
Even of we perform WAL archiving and fortunately this archive contains
correct (not corrupted) WAL segment,
we will have to copy this WAL segment not only to master but also to all
replicas.
is it acceptable?

So I am not sure whether earlier CRC mismatch detection can help us to
recover this error.
And isn't price for it too high?

I wonder what other actions we can perform at master or at replica to
handle this situation?
For example, if we detect record corruption at WAL-sender and corrupted
records contains FPW,
we can try to replace image of the buffer in the record with current
page image.
But it is only possible if page was not changed since this WAL record
was created.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2020-10-02 13:19:44 Re: enable_incremental_sort changes query behavior
Previous Message Daniel Gustafsson 2020-10-02 12:06:48 Re: Error code missing for "wrong length of inner sequence" error