Re: 16-bit page checksums for 9.2

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <simon(at)2ndQuadrant(dot)com>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <aidan(at)highrise(dot)ca>,<stark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-29 17:08:43
Message-ID: 4EFC4A3B02000025000441E2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Heikki Linnakangas wrote:
> Simon Riggs wrote:

>> OK, then we are talking at cross purposes. Double write buffers,
>> in the way you explain them allow us to remove full page writes.
>> They clearly don't do anything to check page validity on read.
>> Torn pages are not the only fault we wish to correct against...
>> and the double writes idea is orthogonal to the idea of checksums.
>
> The reason we're talking about double write buffers in this thread
> is that double write buffers can be used to solve the problem with
> hint bits and checksums.

Exactly. Every time the issue of page checksums is raised, there are
objections because OS or hardware crashes could cause torn pages for
hint-bit-only writes which would be treated as serious errors
(potentially indicating hardware failure) when they are in fact
expected and benign. Some time before the thread dies, someone
generally points out that double-write technology would be a graceful
way to handle that, with the side benefit of smaller WAL files. All
available evidence suggests it would also allow a small performance
improvement, although I hesitate to emphasize that aspect of it; the
other benefits fully justify the effort without that.

I do feel there is value in a page checksum patch even without torn
page protection. The discussion on the list has convinced me that a
failed checksum should be treated as seriously as other page format
errors, rather than as a warning, even though (in the absence of torn
page protection) torn hint-bit-only page writes would be benign.

As an example of how this might be useful, consider our central
databases which contain all the detail replicated from the circuit
court databases in all the counties. These are mission-critical, so
we have redundant servers in separate buildings. At one point, one
of them experienced hardware problems and we started seeing invalid
pages. Since we can shift the load between these servers without
down time, we moved all applications to other servers, and
investigated. Now, it's possible that for some time before we got
errors on the bad pages, there could have been subtle corruption
which didn't generate errors but presented bad data on our web site.
A page checksum would help prevent that sort of problem, and a
post-crash false positive might waste a little time in investigation,
but that cost would be far outweighed by the benefit of better
accuracy guarantees.

Of course, it will be a big plus if we can roll this out in 9.2 in
conjunction with a double-write feature. Not only will double-write
probably be a bit faster than full_page_writes in the WAL log, but it
will allow protection against torn pages on hint-bit-only writes
without adding those writes to the WAL or doing any major
rearrangement of where they sit that would break pg_upgrade. It
would be nice not to have to put all sorts of caveats and
explanations into the docs about how a checksum error might be benign
due to hint bit writes.

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2011-12-29 18:04:49 [RFC] grants vs. inherited tables
Previous Message Kevin Grittner 2011-12-29 16:44:47 Re: 16-bit page checksums for 9.2