Re: Online verification of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-08 17:49:56
Message-ID: 986bd2ca-4318-f8e4-8a5f-320082d31f52@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/8/19 4:19 PM, Julien Rouhaud wrote:
> On Thu, Mar 7, 2019 at 7:00 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>>
>> On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote:
>>>
>>> But then again, we could just
>>> hack a special version of ReadBuffer_common() which would just
>>
>>> (a) check if a page is in shared buffers, and if it is then consider the
>>> checksum correct (because in memory it may be stale, and it was read
>>> successfully so it was OK at that moment)
>>>
>>> (b) if it's not in shared buffers already, try reading it and verify the
>>> checksum, and then just evict it right away (not to spoil sb)
>>
>> This'd also make sense and make the whole process more efficient. OTOH,
>> it might actually be worthwhile to check the on-disk page even if
>> there's in-memory state. Unless IO is in progress the on-disk page
>> always should be valid.
>
> Definitely. I already saw servers with all-frozen-read-only blocks
> popular enough to never get evicted in months, and then a minor
> upgrade / restart having catastrophic consequences.
>

Do I understand correctly the "catastrophic consequences" here are due
to data corruption / broken checksums on those on-disk pages?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2019-03-08 17:59:41 Re: Online verification of checksums
Previous Message Tom Lane 2019-03-08 17:14:09 Re: Why don't we have a small reserved OID range for patch revisions?