Re: Checksums by default?

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums by default?
Date: 2017-02-13 02:41:55
Message-ID: 34e15a92-0bde-6809-b7ba-0cc1681635ab@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/13/2017 02:29 AM, Jim Nasby wrote:
> On 2/10/17 6:38 PM, Tomas Vondra wrote:
>> And no, backups may not be a suitable solution - the failure happens on
>> a standby, and the page (luckily) is not corrupted on the master. Which
>> means that perhaps the standby got corrupted by a WAL, which would
>> affect the backups too. I can't verify this, though, because the WAL got
>> removed from the archive, already. But it's a possibility.
>
> Possibly related... I've got a customer that periodically has SR replias
> stop in their tracks due to WAL checksum failure. I don't think there's
> any hardware correlation (they've seen this on multiple machines).
> Studying the code, it occurred to me that if there's any bugs in the
> handling of individual WAL record sizes or pointers during SR then you
> could get CRC failures. So far every one of these occurrences has been
> repairable by replacing the broken WAL file on the replica. I've
> requested that next time this happens they save the bad WAL.

I don't follow. You're talking about WAL checksums, this thread is about
data checksums. I'm not seeing any WAL checksum failure, but when the
standby attempts to apply the WAL (in particular a Btree/DELETE WAL
record), it detects an incorrect data checksum in the underlying table.

So either there's a hardware issue, or the heap got corrupted by some
preceding WAL. Or maybe one of the tiny gnomes in the CPU got tired and
punched the bits wrong.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2017-02-13 02:43:28 Re: Removal of deprecated views pg_user, pg_group, pg_shadow
Previous Message Jim Nasby 2017-02-13 02:29:25 Re: gitlab post-mortem: pg_basebackup waiting for checkpoint