Re: 16-bit page checksums for 9.2

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-22 12:30:46
Message-ID: CA+U5nMKUHRwJ_YgrcaJaLxuTvdXDd9YwF5C1Ds7ayg2db8Hmuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 7:06 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Sun, Feb 19, 2012 at 05:04:06PM -0500, Robert Haas wrote:
>> On Sun, Feb 19, 2012 at 11:35 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> >> So, when the page has a checksum, PD_CHECKSUM2 is not set, and when it
>> >> doesn't have a checksum, PD_CHECKSUM2 is not set? ?What good does that
>> >> do?
>> >
>> > As explained in detailed comments, the purpose of this is to implement
>> > Heikki's suggestion that we have a bit set to zero so we can detect
>> > failures that cause a run of 1s.
>>
>> I think it's nonsensical to pretend that there's anything special
>> about that particular bit.  If we want to validate the page header
>> before trusting the lack of a checksum bit, we can do that far more
>> thoroughly than just checking that one bit.  There are a whole bunch
>> of bits that ought to always be zero, and there are other things we
>> can validate as well (e.g. LSN not in future).  If we're concerned
>> about the checksum-enabled bit getting flipped (and I agree that we
>> should be), we can check some combination of that stuff in the hope of
>> catching it, and that'll be a far better guard than just checking one
>> arbitrarily selected bit.
>
> PageHeaderIsValid() (being renamed to PageIsVerified()) already checks
> "(page->pd_flags & ~PD_VALID_FLAG_BITS) == 0".  Explicitly naming another bit
> and keeping it unset is redundant with that existing check.  It would cease to
> be redundant if we ever allocate all the flag bits, but then we also wouldn't
> have a bit to spare as PD_CHECKSUM2.  I agree with you on this point.

No problem to remove that. It was there at Heikki's request.

>> That having been said, I don't feel very good about the idea of
>> relying on the contents of the page to tell us whether or not the page
>> has a checksum.  There's no guarantee that an error that flips the
>> has-checksum bit will flip any other bit on the page, or that it won't
>> flip everything else we're relying on as a backstop in exactly the
>> manner that foils whatever algorithm we put in place.  Random
>> corruption is, perhaps, unlikely to do that, but somehow I feel like
>> it'll happen more often than random chance suggests.  Things never
>> fail the way you want them to.
>>
>> Another disadvantage of the current scheme is that there's no
>> particularly easy way to know that your whole cluster has checksums.
>> No matter how we implement checksums, you'll have to rewrite every
>> table in the cluster in order to get them fully turned on.  But with
>> the current design, there's no easy way to know how much of the
>> cluster is actually checksummed.  If you shut checksums off, they'll
>> linger until those pages are rewritten, and there's no easy way to
>> find the relations from which they need to be removed, either.
>
> I'm not seeing value in rewriting pages to remove checksums, as opposed to
> just ignoring those checksums going forward.  Did you have a particular
> scenario in mind?

Agreed. No reason to change a checksum unless we rewrite the block, no
matter whether page_checksums is on or off.

>> I'm tempted to suggest a relation-level switch: when you want
>> checksums, you use ALTER TABLE to turn them on, and when you don't
>> want them any more you use ALTER TABLE to shut them off again, in each
>> case rewriting the table.  That way, there's never any ambiguity about
>> what's in the data pages in a given relation: either they're either
>> all checksummed, or none of them are.  This moves the decision about
>> whether checksums are enabled or disabled quite a but further away
>> from the data itself, and also allows you to determine (by catalog
>> inspection) which parts of the cluster do in fact have checksums.  It
>> might be kind of a pain to implement, though: you'd have to pass the
>> information about how any given relation was configured down to the
>> place where we validate page sanity.  I'm not sure whether that's
>> practical.
>
> This patch implies future opportunities to flesh out its UI, and I don't see
> it locking us out of implementing the above.

Agreed

> We'll want a weak-lock command
> that adds checksums to pages lacking them.

VACUUM FREEZE will do this.

> We'll want to expose whether a
> given relation has full checksum coverage.

Not sure I understand why we'd want that. If you really want that, its
a simple function to do it.

> With that, we could produce an
> error when an apparently-non-checksummed page appears in a relation previously
> known to have full checksum coverage.

Additional checking at relation level is just going to slow things
down. Don't see the value of having relation level checksum option.

It would be an easy thing to skip checksums on UNLOGGED relations, and
possibly desirable, but I don't see the utility of a per relation
checksum setting in other cases. There's just no need for fine tuning
like that.

> Even supposing an "ALTER TABLE t SET {WITH | WITHOUT} CHECKSUMS" as the only
> tool for enabling or disabling them, it's helpful to have each page declare
> whether it has a checksum.  That way, the rewrite can take as little as an
> AccessShareLock, and any failure need not lose work already done.  If you rely
> on anticipating the presence of a checksum based on a pg_class column, that
> ALTER needs to perform an atomic rewrite.

That will pretty much kill the usability of this feature. We must be
able to add checksums without taking a full lock on the table and/or
database. Making it an ALTER TABLE would require huge maintenance
effort to add checksums to a database.

-1 to the idea of having ALTER TABLE variant to enable/disable checksums

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2012-02-22 12:37:25 Re: Potential reference miscounts and segfaults in plpython.c
Previous Message Greg Smith 2012-02-22 11:53:49 pg_test_timing tool for EXPLAIN ANALYZE overhead