Re: Online verification of checksums

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Online verification of checksums
Date: 2019-03-29 15:38:02
Message-ID: 20190329153802.GM6197@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2019-03-29 11:30:15 -0400, Stephen Frost wrote:
> > * Magnus Hagander (magnus(at)hagander(dot)net) wrote:
> > > On Thu, Mar 28, 2019 at 10:19 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
> > > wrote:
> > > > On Thu, Mar 28, 2019 at 01:11:40PM -0700, Andres Freund wrote:
> > > > >Hi,
> > > > >
> > > > >On 2019-03-28 21:09:22 +0100, Michael Banck wrote:
> > > > >> I agree that the current patch might have some corner-cases where it
> > > > >> does not guarantee 100% accuracy in online mode, but I hope the current
> > > > >> version at least has no more false negatives.
> > > > >
> > > > >False positives are *bad*. We shouldn't integrate code that has them.
> > > > >
> > > >
> > > > Yeah, I agree. I'm a bit puzzled by the reluctance to make the online mode
> > > > communicate with the server, which would presumably address these issues.
> > > > Can someone explain why not to do that?
> > >
> > > I agree that this effort seems better spent on fixing those issues there
> > > (of which many are the same), and then re-use that.
> >
> > This really seems like it depends on which of the options we're talking
> > about.. Connecting to the server and asking what the current insert
> > point is, so we can check that the LSN isn't completely insane, seems
> > reasonable, but at least one option being discussed was to have
> > pg_basebackup actually *lock the page* (even if just for I/O..) and then
> > re-read it, and having an external tool doing that instead of the
> > backend seems like a whole different level to me. That would involve
> > having an SQL function for "lock this page against I/O" and then another
> > for "unlock this page", wouldn't it?
>
> No, I don't think so. And we obviously couldn't have a SQL level
> function hold an LWLock after it has finished, that'd make undetected
> deadlocks triggerable by users. The way I'd imagine that being done is
> to just perform the checksum test in the commandline tool, and whenever
> there's a checksum failure that could plausibly be a torn read, call a
> server side function that re-tests the page after locking it. Which then
> would just return the error message in a string.

The server-side function would essentially lock the page against i/o,
re-read it off disk into an independent location, unlock the page, then
calculate the checksum and report back?

That seems like it would be reasonable to me. Wouldn't it make sense to
then have pg_basebackup use that same function..?

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-03-29 15:40:33 Re: Online verification of checksums
Previous Message Ashutosh Sharma 2019-03-29 15:35:21 Re: table_privileges view always show object owner as a grantor