Re: Enabling Checksums

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Jesper Krogh <jesper(at)krogh(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Enabling Checksums
Date: 2012-11-19 18:35:45
Message-ID: 1353350145.10198.130.camel@jdavis-laptop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2012-11-19 at 18:30 +0100, Andres Freund wrote:
> Yes, definitely.

OK. I suppose that makes sense for large writes.

> > If that is not true, then I'm concerned about replicating corruption, or
> > backing up corrupt blocks over good ones. How do we prevent that? It
> > seems like a pretty major hole if we can't, because it means the only
> > safe replication is streaming replication; a base-backup is essentially
> > unsafe. And it means that even an online background checking utility
> > would be quite hard to do properly.
>
> I am not sure I see the danger in the base backup case here? Why would
> we have corrupted backup blocks? While postgres is running we won't see
> such torn pages because its all done under proper locks...

Yes, the blocks written *after* the checkpoint might have a bad checksum
that will be fixed during recovery. But the blocks written *before* the
checkpoint should have a valid checksum, but if they don't, then
recovery doesn't know about them.

So, we can't verify the checksums in the base backup because it's
expected that some blocks will fail the check, and they can be fixed
during recovery. That gives us no protection for blocks that were truly
corrupted and written long before the last checkpoint.

I suppose if we could somehow differentiate the blocks, that might work.
Maybe look at the LSN and only validate blocks written before the
checkpoint? But of course, that's a problem because a corrupt block
might have the wrong LSN (in fact, it's likely, because garbage is more
likely to make the LSN too high than too low).

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2012-11-19 18:38:12 Re: too much pgbench init output
Previous Message Atri Sharma 2012-11-19 18:29:16 Re: Do we need so many hint bits?