Re: Page Checksums

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page Checksums
Date: 2011-12-19 18:46:25
Message-ID: 4EEF8681.8020903@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/19/2011 07:50 AM, Robert Haas wrote:
> On Mon, Dec 19, 2011 at 6:10 AM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>> The only sensible way to handle this is to change the page format as
>> discussed. IMHO the only sensible way that can happen is if we also
>> support an online upgrade feature. I will take on the online upgrade
>> feature if others work on the page format issues, but none of this is
>> possible for 9.2, ISTM.
> I'm not sure that I understand the dividing line you are drawing here.

There are three likely steps to reaching checksums:

1) Build a checksum mechanism into the database. This is the
straighforward part that multiple people have now done.

2) Rework hint bits to make the torn page problem go away. Checksums go
elsewhere? More WAL logging to eliminate the bad situations? Eliminate
some types of hint bit writes? It seems every alternative has
trade-offs that will require serious performance testing to really validate.

3) Finally tackle in-place upgrades that include a page format change.
One basic mechanism was already outlined: a page converter that knows
how to handle two page formats, some metadata to track which pages have
been converted, a daemon to do background conversions. Simon has some
new ideas here too ("online upgrade" involves two clusters kept in sync
on different versions, slightly different concept than the current
"in-place upgrade"). My recollection is that the in-place page upgrade
work was pushed out of the critical path before due to lack of immediate
need. It wasn't necessary until a) a working catalog upgrade tool was
validated and b) a bite-size feature change to test it on appeared. We
have (a) now in pg_upgrade, and CRCs could be (b)--if the hint bit
issues are sorted first.

What Simon was saying is that he's got some interest in (3), but wants
no part of (2).

I don't know how much time each of these will take. I would expect that
(2) and (3) have similar scopes though--many days, possibly a few
months, of work--which means they both dwarf (1). The part that's been
done is the visible tip of a mostly underwater iceburg.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-12-19 18:51:37 reprise: pretty print viewdefs
Previous Message Robert Haas 2011-12-19 18:43:18 Re: RangeVarGetRelid()