Re: Page Checksums + Double Writes

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, alvherre(at)commandprompt(dot)com, david(at)fetter(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page Checksums + Double Writes
Date: 2011-12-22 21:58:20
Message-ID: CA+U5nMJ=hJKZ9HV=Y26kgQqsrnByYsy1ddiX3PdtiyGciJ9iUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 22, 2011 at 9:50 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> Simon, does it sound like I understand your proposal?

Yes, thanks for restating.

> Now, on to the separate-but-related topic of double-write.  That
> absolutely requires some form of checksum or CRC to detect torn
> pages, in order for the technique to work at all.  Adding a CRC
> without double-write would work fine if you have a storage stack
> which prevents torn pages in the file system or hardware driver.  If
> you don't have that, it could create a damaged page indication after
> a hardware or OS crash, although I suspect that would be the
> exception, not the typical case.  Given all that, and the fact that
> it would be cleaner to deal with these as two separate patches, it
> seems the CRC patch should go in first.  (And, if this is headed for
> 9.2, *very soon*, so there is time for the double-write patch to
> follow.)

It could work that way, but I seriously doubt that a technique only
mentioned in dispatches one month before the last CF is likely to
become trustable code within one month. We've been discussing CRCs for
years, so assembling the puzzle seems much easier, when all the parts
are available.

> It seems to me that the full_page_writes GUC could become an
> enumeration, with "off" having the current meaning, "wal" meaning
> what "on" now does, and "double" meaning that the new double-write
> technique would be used.  (It doesn't seem to make any sense to do
> both at the same time.)  I don't think we need a separate GUC to tell
> us *what* to protect against torn pages -- if not "off" we should
> always protect the first write of a page after checkpoint, and if
> "double" and write_page_crc (or whatever we call it) is "on", then we
> protect hint-bit-only writes.  I think.  I can see room to argue that
> with CRCs on we should do a full-page write to the WAL for a
> hint-bit-only change, or that we should add another GUC to control
> when we do this.
>
> I'm going to take a shot at writing a patch for background hinting
> over the holidays, which I think has benefit alone but also boosts
> the value of these patches, since it would reduce double-write
> activity otherwise needed to prevent spurious error when using CRCs.

I would suggest you examine how to have an array of N bgwriters, then
just slot the code for hinting into the bgwriter. That way a bgwriter
can set hints, calc CRC and write pages in sequence on a particular
block. The hinting needs to be synchronised with the writing to give
good benefit.

If we want page checksums in 9.2, I'll need your help, so the hinting
may be a sidetrack.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-12-22 22:04:03 Re: atexit vs. on_exit
Previous Message Robert Haas 2011-12-22 21:18:56 Re: [v9.2] Fix Leaky View Problem