Re: Enabling Checksums

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enabling Checksums
Date: 2012-11-19 21:46:23
Message-ID: 1353361583.1102.18.camel@sussancws0025
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2012-11-19 at 10:35 -0800, Jeff Davis wrote:
> Yes, the blocks written *after* the checkpoint might have a bad checksum
> that will be fixed during recovery. But the blocks written *before* the
> checkpoint should have a valid checksum, but if they don't, then
> recovery doesn't know about them.
>
> So, we can't verify the checksums in the base backup because it's
> expected that some blocks will fail the check, and they can be fixed
> during recovery. That gives us no protection for blocks that were truly
> corrupted and written long before the last checkpoint.
>
> I suppose if we could somehow differentiate the blocks, that might work.
> Maybe look at the LSN and only validate blocks written before the
> checkpoint? But of course, that's a problem because a corrupt block
> might have the wrong LSN (in fact, it's likely, because garbage is more
> likely to make the LSN too high than too low).

It might be good enough here to simply retry the checksum verification
if it fails for any block. Postgres shouldn't be issuing write()s for
the same block very frequently, and they shouldn't take very long, so
the chances of failing several times seems vanishingly small unless it's
a real failure.

Through a suitably complex mechanism, I think we can be more sure. The
external program could wait for a checkpoint (or force one manually),
and then recalculate the checksum for that page. If checksum is the same
as the last time, then we know the block is bad (because the checkpoint
would have waited for any writes in progress). If the checksum does
change, then we assume postgres must have modified it since the backup
started, so we can assume that we have a full page image to fix it. (A
checkpoint is a blunt tool here, because all we need to do is wait for
the write() call to finish, but it suffices.)

That complexity is probably not required, and simply retrying a few
times is probably much more practical. But it still bothers me a little
to think that the external tool could falsely indicate a checksum
failure, however remote that chance.

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-11-19 21:58:19 Re: WIP: index support for regexp search
Previous Message Alvaro Herrera 2012-11-19 21:36:28 Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)