Re: basebackup checksum verification

From: Andres Freund <andres(at)anarazel(dot)de>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Magnus Hagander <magnus(at)hagander(dot)net>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: basebackup checksum verification
Date: 2019-03-26 23:49:21
Message-ID: 20190326234921.t3our43uya6bncdt@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
> Greetings,
>
> * Andres Freund (andres(at)anarazel(dot)de) wrote:
> > As detailed in
> > https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> > the way the backend's basebackup checksum verification works makes its
> > error detection capabilities very dubious.
>
> I disagree that it's 'very dubious', even with your analysis.

I really don't know what to say. The current algorithm is flat out
bogus.

> I thought Robert's response was generally good, pointing out that
> we're talking about this being an issue if the corruption happens in a
> certain set of bytes. That said, I'm happy to see improvements in
> this area but I'm flat out upset about the notion that we must be
> perfect here- our checksums themselves aren't perfect for catching
> corruption either.

The point is that we're not detecting errors that we can detect when
read outside of basebackup. I really entirely completely fail how that
can be defended.

I think we're making promises with this the basebackup feature we're not
even remotely keeping. I don't understand how you can defend that, given
the current state, you can have a basebackup that you took with
checksums enabled, and then when actually use that basebackup you get
checksum failures. Like it's one thing not to detect all storage
issues, but if we do detect them after using the basebackup, that's
really not ok.

> > 2) If 1) errored out, ensure that that's because the backend is
> > currently writing out the page. That basically requires doing what
> > BufferAlloc() does. So I think we'd need to end up with a function
> > like:
> >
> > LockoutBackendWrites():
> > buf = ReadBufferWithoutRelcache(relfilenode);
>
> This is going to cause it to be pulled into shared buffers, if it isn't
> already there, isn't it?

I can't see that being a problem. We're only going to enter this path if
we encountered a buffer where the checksum was wrong. And either that's
a data corruption even in which case we don't care about a small
performance penalty, or it's a race around writing out the page because
basebackup read it while half writen - in which case it's pretty pretty
likely that the page is still in shared buffers.

> That seems less than ideal and isn't it going
> to end up just doing exactly the same PageIsVerified() call, and what
> happens when that fails?

That seems quite easily handled with a different RBM_ mode.

> > LWLockAcquire(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> > /*
> > * Reread page from OS, and recheck. This needs to happen while
> > * the IO lock prevents rereading from the OS. Note that we do
> > * not want to rely on the buffer contents here, as that could
> > * be very old cache contents.
> > */
> > perform_checksum_check(relfilenode, ERROR);
> >
> > LWLockRelease(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> > ReleaseBuffer(buf);

This should be the IO lock, not content lock, sorry. Copy & pasto.

> I don't particularly like having to lock pages in this way while
> performing this check, espectially with having to read the page into
> shared buffers potentially.

Given it's only the IO lock (see above correction), and only if we can't
verify the checksum during the race, I fail to see how that can be a
problem?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-03-27 00:06:45 pgsql: Compute XID horizon for page level index vacuum on primary.
Previous Message Alvaro Herrera 2019-03-26 23:27:33 Re: partitioned tables referenced by FKs