Re: 16-bit page checksums for 9.2

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-22 07:06:49
Message-ID: 20120222070649.GF8592@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote:
> On Sun, Feb 19, 2012 at 11:35 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
> >> doesn't have a checksum, PD_CHECKSUM2 is not set? ?What good does that
> >> do?
> >
> > As explained in detailed comments, the purpose of this is to implement
> > Heikki's suggestion that we have a bit set to zero so we can detect
> > failures that cause a run of 1s.
>
> I think it's nonsensical to pretend that there's anything special
> about that particular bit. If we want to validate the page header
> before trusting the lack of a checksum bit, we can do that far more
> thoroughly than just checking that one bit. There are a whole bunch
> of bits that ought to always be zero, and there are other things we
> can validate as well (e.g. LSN not in future). If we're concerned
> about the checksum-enabled bit getting flipped (and I agree that we
> should be), we can check some combination of that stuff in the hope of
> catching it, and that'll be a far better guard than just checking one
> arbitrarily selected bit.

PageHeaderIsValid() (being renamed to PageIsVerified()) already checks
"(page->pd_flags & ~PD_VALID_FLAG_BITS) == 0". Explicitly naming another bit
and keeping it unset is redundant with that existing check. It would cease to
be redundant if we ever allocate all the flag bits, but then we also wouldn't
have a bit to spare as PD_CHECKSUM2. I agree with you on this point.

> That having been said, I don't feel very good about the idea of
> relying on the contents of the page to tell us whether or not the page
> has a checksum. There's no guarantee that an error that flips the
> has-checksum bit will flip any other bit on the page, or that it won't
> flip everything else we're relying on as a backstop in exactly the
> manner that foils whatever algorithm we put in place. Random
> corruption is, perhaps, unlikely to do that, but somehow I feel like
> it'll happen more often than random chance suggests. Things never
> fail the way you want them to.
>
> Another disadvantage of the current scheme is that there's no
> particularly easy way to know that your whole cluster has checksums.
> No matter how we implement checksums, you'll have to rewrite every
> table in the cluster in order to get them fully turned on. But with
> the current design, there's no easy way to know how much of the
> cluster is actually checksummed. If you shut checksums off, they'll
> linger until those pages are rewritten, and there's no easy way to
> find the relations from which they need to be removed, either.

I'm not seeing value in rewriting pages to remove checksums, as opposed to
just ignoring those checksums going forward. Did you have a particular
scenario in mind?

> I'm tempted to suggest a relation-level switch: when you want
> checksums, you use ALTER TABLE to turn them on, and when you don't
> want them any more you use ALTER TABLE to shut them off again, in each
> case rewriting the table. That way, there's never any ambiguity about
> what's in the data pages in a given relation: either they're either
> all checksummed, or none of them are. This moves the decision about
> whether checksums are enabled or disabled quite a but further away
> from the data itself, and also allows you to determine (by catalog
> inspection) which parts of the cluster do in fact have checksums. It
> might be kind of a pain to implement, though: you'd have to pass the
> information about how any given relation was configured down to the
> place where we validate page sanity. I'm not sure whether that's
> practical.

This patch implies future opportunities to flesh out its UI, and I don't see
it locking us out of implementing the above. We'll want a weak-lock command
that adds checksums to pages lacking them. We'll want to expose whether a
given relation has full checksum coverage. With that, we could produce an
error when an apparently-non-checksummed page appears in a relation previously
known to have full checksum coverage.

Even supposing an "ALTER TABLE t SET {WITH | WITHOUT} CHECKSUMS" as the only
tool for enabling or disabling them, it's helpful to have each page declare
whether it has a checksum. That way, the rewrite can take as little as an
AccessShareLock, and any failure need not lose work already done. If you rely
on anticipating the presence of a checksum based on a pg_class column, that
ALTER needs to perform an atomic rewrite.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2012-02-22 08:37:36 Re: REASSIGN OWNED lacks support for FDWs
Previous Message Noah Misch 2012-02-22 05:27:47 Re: Measuring relation free space