Re: Online verification of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online verification of checksums
Date: 2019-03-04 14:08:09
Message-ID: 42c56652-bec1-9a6b-a765-979709457cf1@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/4/19 2:00 AM, Michael Paquier wrote:
> On Sun, Mar 03, 2019 at 03:12:51AM +0100, Tomas Vondra wrote:
>> You and Andres may be right that trying to verify checksums online
>> without close interaction with the server is ultimately futile (or at
>> least overly complex). But I'm not sure those issues (torn pages and
>> partial reads) are very good arguments, considering basebackup has to
>> deal with them too. Not sure.
>
> FWIW, I don't think that the backend is right in its way of checking
> checksums the way it does currently either with warnings and a limited
> set of failures generated. I raised concerns about that unfortunately
> after 11 has been GA'ed, which was too late, so this time, for this
> patch, I prefer raising them before the fact and I'd rather not spread
> this kind of methodology around the core code more and more.

I still don't understand what issue you see in how basebackup verifies
checksums. Can you point me to the explanation you've sent after 11 was
released?

> I work a lot with virtualization, and I have seen ESX hanging around
> I/O requests from time to time depending on the environment used
> (which is actually wrong, anyway, but a lot of tests happen on a
> daily basis on the stuff I work on). What's presented on this thread
> is *never* going to be 100% safe, and would generate false positives
> which can be confusing for the user. This is not a good sign.

So you have a workload/configuration that actually results in data
corruption yet we fail to detect that? Or we generate false positives?
Or what do you mean by "100% safe" here?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2019-03-04 14:39:46 Re: Re: proposal: variadic argument support for least, greatest function
Previous Message Antonin Houska 2019-03-04 14:04:31 Re: Ordered Partitioned Table Scans