Re: 16-bit page checksums for 9.2

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-06 17:59:59
Message-ID: 20120206175959.GD19450@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 06, 2012 at 09:05:19AM +0000, Simon Riggs wrote:
> > In any case, I think it's a very bad idea to remove the page version field.
> > How are we supposed to ever be able to change the page format again if we
> > don't even have a version number on the page? I strongly oppose removing it.
>
> Nobody is removing the version field, nor is anybody suggesting not
> being able to tell which page version we are looking at.

Agreed. I thought the idea was that we have a 16-bit page _version_
number and 8+ free page _flag_ bits, which are all currently zero. The
idea was to move the version number from 16-bit field into the unused
flag bits, and use the 16-bit field for the checksum. I would like to
have some logic that would allow tools inspecting the page to tell if
they should look for the version number in the bits at the beginning of
the page or at the end.

Specifically, this becomes the checksum:

uint16 pd_pagesize_version;

and this holds the page version, if we have updated the page to the new
format:

uint16 pd_flags; /* flag bits, see below */

Of the 16 bits of pd_flags, these are the only ones used:

#define PD_HAS_FREE_LINES 0x0001 /* are there any unused line pointers? */
#define PD_PAGE_FULL 0x0002 /* not enough free space for new
* tuple? */
#define PD_ALL_VISIBLE 0x0004 /* all tuples on page are visible to
* everyone */

#define PD_VALID_FLAG_BITS 0x0007 /* OR of all valid pd_flags bits */

> > I'm also not very comfortable with the idea of having flags on the page
> > indicating whether it has a checksum or not. It's not hard to imagine a
> > software of firmware bug or hardware failure that would cause pd_flags field
> > to be zeroed out altogether. It would be more robust if the expected
> > bit-pattern was not 0-0, but 1-0 or 0-1, but then you need to deal with that
> > at upgrade somehow. And it still feels a bit whacky anyway.

> Good idea. Lets use
>
> 0-0-0 to represent upgraded from previous version, needs a bit set
> 0-0-1 to represent new version number of page, no checksum
> 1-1-1 to represent new version number of page, with checksum
>
> So we have 1 bit dedicated to the page version, 2 bits to the checksum indicator

Interesting point that we would not be guarding against a bit flip from
1 to 0 for the checksum bit; I agree using two bits is the way to go. I
don't see how upgrade figures into this.

However, I am unclear how Simon's idea above actually works. We need
two bits for redundancy, both 1, to mark a page as having a checksum. I
don't think mixing the idea of a new page version and checksum enabled
really makes sense, especially since we have to plan for future page
version changes.

I think we dedicate 2 bits to say we have computed a checksum, and 3
bits to mark up to 8 possible page versions, so the logic is, in
pd_flags, we use bits 0x8 and 0x16 to indicate that a checksum is stored
on the page, and we use 0x32 and later for the page version number. We
can assume all the remaining bits are for the page version number until
we need to define new bits, and we can start storing them at the end
first, and work forward. If all the version bits are zero, it means the
page version number is still stored in pd_pagesize_version.

> >> I wonder if we should just dedicate 3 page header bits, call that the
> >> page version number, and set this new version number to 1, and assume
> >> all previous versions were zero, and have them look in the old page
> >> version location if the new version number is zero.  I am basically
> >> thinking of how we can plan ahead to move the version number to a new
> >> location and have a defined way of finding the page version number using
> >> old and new schemes.
> >
> >
> > Three bits seems short-sighted, but yeah, something like 6-8 bits should be
> > enough. On the whole, though. I think we should bite the bullet and invent a
> > way to extend the page header at upgrade.
>
> There are currently many spare bits. I don't see any need to allocate
> them to this specific use ahead of time - especially since that is the
> exact decision we took last time when we reserved 16 bits for the
> version.

Right, but I am thinking we should set things up so we can grow the page
version number into the unused bit, rather than box it between bits we
are already using.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2012-02-06 18:03:36 Re: SKIP LOCKED DATA
Previous Message Andrew Dunstan 2012-02-06 17:18:19 Re: [GENERAL] pg_dump -s dumps data?!