Re: corrupt pages detected by enabling checksums

From: Jim Nasby <jim(at)nasby(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: corrupt pages detected by enabling checksums
Date: 2013-05-08 22:56:13
Message-ID: 518AD80D.1060904@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4/5/13 6:39 PM, Jeff Davis wrote:
> On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:
>> Maybe we could scan forward to check whether a corrupted WAL record is
>> followed by one or more valid ones with sensible LSNs. If it is,
>> chances are high that we haven't actually hit the end of the WAL. In
>> that case, we could either log a warning, or (better, probably) abort
>> crash recovery.
>
> +1.
>
>> Corruption of fields which we require to scan past the record would
>> cause false negatives, i.e. no trigger an error even though we do
>> abort recovery mid-way through. There's a risk of false positives too,
>> but they require quite specific orderings of writes and thus seem
>> rather unlikely. (AFAICS, the OS would have to write some parts of
>> record N followed by the whole of record N+1 and then crash to cause a
>> false positive).
>
> Does the xlp_pageaddr help solve this?
>
> Also, we'd need to be a little careful when written-but-not-flushed WAL
> data makes it to disk, which could cause a false positive and may be a
> fairly common case.

Apologies if this is a stupid question, but is this mostly an issue due to torn pages? IOW, if we had a way to ensure we never see torn pages, would that mean an invalid CRC on a WAL page indicated there really was corruption on that page?

Maybe it's worth putting (yet more) thought into the torn page issue... :/
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2013-05-08 23:08:28 Re: Proposal to add --single-row to psql
Previous Message Bruce Momjian 2013-05-08 21:35:12 Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4