Re: Online verification of checksums

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Michael Banck <michael(dot)banck(at)credativ(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-06 17:26:58
Message-ID: CA+TgmoZ1wb5x2YRG-Rut4Nb2_K+T6326nGKf0DP_y1bVLsi1bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 2, 2019 at 4:38 PM Tomas Vondra
<tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> FWIW I don't think this qualifies as torn page - i.e. it's not a full
> read with a mix of old and new data. This is partial write, most likely
> because we read the blocks one by one, and when we hit the last page
> while the table is being extended, we may only see the fist 4kB. And if
> we retry very fast, we may still see only the first 4kB.

I see the distinction you're making, and you're right. The problem
is, whether in this case or whether for a real torn page, we don't
seem to have a way to distinguish between a state that occurs
transiently due to lack of synchronization and a situation that is
permanent and means that we have corruption. And that worries me,
because it means we'll either report bogus complaints that will scare
easily-panicked users (and anybody who is running this tool has a good
chance of being in the "easily-panicked" category ...), or else we'll
skip reporting real problems. Neither is good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-06 17:33:49 Re: Online verification of checksums
Previous Message Alvaro Herrera 2019-03-06 17:13:37 Re: patch to allow disable of WAL recycling