Should walsernder check correctness of WAL records?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Should walsernder check correctness of WAL records?
Date: 2020-10-01 15:38:40
Message-ID: 5a763a6c-aa3d-ac10-a54a-e372d2e2c762@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

Investigating one of customer's support cases I found out that walsender
is not calculating WAL records CRC and send them to replicas without any
checks.
As a result damaged WAL record causes errors on all replicas:

        LOG: incorrect resource manager data checksum in record at
5FB9/D199F7D8
        FATAL: terminating walreceiver process due to administrator command

I wonder if it will be better to detect this problem earlier at master?
We can try to recover damaged WAL record (it is not always possible, but...)
Or at least do not advance replication slots and make it possible for
DBA to restore corrupted WAL segment from archive and resume replication.

And right now the only choice is to restore replicas using basebackup
which may take significant amount of time (for larger database).
And during this time master will not be protected from failures.

Or extra overhead of computing CRC in WAL sender is assumed to be to high?

Sorry, if this question was already discussed - I failed to find it in
the archive.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-10-01 15:46:00 Re: Retry Cached Remote Connections for postgres_fdw in case remote backend gets killed/goes away
Previous Message Tom Lane 2020-10-01 15:19:12 Re: small cleanup: unify scanstr() functions