Re: Checksums, state of play

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksums, state of play
Date: 2012-03-06 15:31:44
Message-ID: CA+U5nMLtuegEaS-poHFfYTWqxbksSAVS8WnZzuDCKwkMLLdoeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 6, 2012 at 2:25 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> 4. Checksums are being removed, but some blocks may still have them.
> Thus, it's not an error for a block to have no checksum, but any
> still-remaining checksums should be correct (though possibly we ought
> not to complain if they aren't, to provide a recovery path for users
> who are turning checksums off because they're getting errors they
> don't want).  Any block that's written is written without checksums.

I agree its advantageous to have a means of removing pagesums from
data blocks as a 4th state.

I think we need a 5th state - pagesums disabled. Which allows an
emergency disabling of the feature without rewriting the blocks.
Obviously if the database is damaged and I/O devices going bad, trying
to rescan database is likely to cause further problems.

> I think we need to be clear about how the system transitions between
> these states.  In the current patch, AIUI, you can effectively go from
> 1->2 or 4->2 by setting page_checksums=on and from 2->4 by setting
> page_checksums=off, but there's no easy way to ensure that you've
> reached state 3 or that you've gotten back to state 1.  Some variant
> of VACUUM seems like a good way of doing that, but it doesn't make
> sense for example to have page_checksums=off and do VACUUM (CHECKSUMS
> ON), or to have page_checksums=on and do VACUUM (CHECKSUMS OFF).  I
> guess we could just reject those as error cases, but it would be
> nicer, I think, to have an interface with a higher degree of
> orthogonality.

Right, a misunderstanding I think.

If we have states set at database level then we'd not have a GUC as well.

> There's probably more than one way to do that, but my personal
> preference, as previously noted, is to make this a table-level option,
> rather than a GUC.  Then, VACUUM (CHECKSUMS ON) can first change the
> pg_class entry to indicate that checksums are enabling-in-progress
> (i.e. 1->2), then scan the table, adding checksums, and then mark
> checksums as fully enabled (i.e. 2->3).  VACUUM (CHECKSUMS OFF) can
> proceed in symmetric fashion, marking checksums as
> disabling-in-progress (3->4), then scanning the table and getting rid
> of them, and then marking them fully disabled (4->1).  If a crash
> happens in the middle somewhere, the state of the table can get left
> as enabling-in-progress or disabling-in-progress, but a new VACUUM
> (CHECKSUMS X) can be used to finish the process, and we always know
> exactly where we're at.

Any in-progress state needs to have checksums removed first, then re-added.

I'll keep an open mind for now about database/table level. I'm not
sure how possible/desirable each is.

> I generally agree with this outline, though I think that in lieu of a
> version number we could simply set a new pd_flags bit indicating that
> checksums are enabled.  If we haven't fully enabled checksums yet,
> then the fact that this bit isn't set is not an error; but if
> checksums are fully enabled, then every page must have that bit set,
> and any page that doesn't is ipso facto corrupt.

Whether to have it or not, if a corruption occurs during
checksum-enabling then we could get a false reading. If we have a bit
then the bit can be set wrong, so we could either make a check when it
wasn't due, or skip a check we should have made. If we don't have a
bit and so skip checksum checking during enabling process then we can
get an error that isn't spotted by the checksum process.

Given it can happen both ways, we should have a bit/ or not depending
upon which is the least likely to be wrong. I would say having the bit
provides the least likely way to get false readings.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-03-06 15:33:22 Re: elegant and effective way for running jobs inside a database
Previous Message Tom Lane 2012-03-06 15:21:19 Re: elegant and effective way for running jobs inside a database