From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Checksums by default? |
Date: | 2017-02-13 02:41:55 |
Message-ID: | 34e15a92-0bde-6809-b7ba-0cc1681635ab@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02/13/2017 02:29 AM, Jim Nasby wrote:
> On 2/10/17 6:38 PM, Tomas Vondra wrote:
>> And no, backups may not be a suitable solution - the failure happens on
>> a standby, and the page (luckily) is not corrupted on the master. Which
>> means that perhaps the standby got corrupted by a WAL, which would
>> affect the backups too. I can't verify this, though, because the WAL got
>> removed from the archive, already. But it's a possibility.
>
> Possibly related... I've got a customer that periodically has SR replias
> stop in their tracks due to WAL checksum failure. I don't think there's
> any hardware correlation (they've seen this on multiple machines).
> Studying the code, it occurred to me that if there's any bugs in the
> handling of individual WAL record sizes or pointers during SR then you
> could get CRC failures. So far every one of these occurrences has been
> repairable by replacing the broken WAL file on the replica. I've
> requested that next time this happens they save the bad WAL.
I don't follow. You're talking about WAL checksums, this thread is about
data checksums. I'm not seeing any WAL checksum failure, but when the
standby attempts to apply the WAL (in particular a Btree/DELETE WAL
record), it detects an incorrect data checksum in the underlying table.
So either there's a hardware issue, or the heap got corrupted by some
preceding WAL. Or maybe one of the tiny gnomes in the CPU got tired and
punched the bits wrong.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2017-02-13 02:43:28 | Re: Removal of deprecated views pg_user, pg_group, pg_shadow |
Previous Message | Jim Nasby | 2017-02-13 02:29:25 | Re: gitlab post-mortem: pg_basebackup waiting for checkpoint |