Re: 16-bit page checksums for 9.2

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-09 15:16:04
Message-ID: CA+U5nM+CRsUepiGHnz4K9VBDnQnTspAzKSMEJ-Miy+M6C0x+Yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 8, 2012 at 2:05 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> On Wed, Feb 08, 2012 at 11:40:34AM +0000, Simon Riggs wrote:
>> On Wed, Feb 8, 2012 at 3:24 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> > On Tue, Feb 07, 2012 at 08:58:59PM +0000, Simon Riggs wrote:
>> >> On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> >> > This patch uses FPIs to guard against torn hint writes, even when the
>> >> > checksums are disabled. ?One could not simply condition them on the
>> >> > page_checksums setting, though. ?Suppose page_checksums=off and we're hinting
>> >> > a page previously written with page_checksums=on. ?If that write tears,
>> >> > leaving the checksum intact, that block will now fail verification. ?A couple
>> >> > of ideas come to mind. ?(a) Read the on-disk page and emit an FPI only if the
>> >> > old page had a checksum. ?(b) Add a checksumEnableLSN field to pg_control.
>> >> > Whenever the cluster starts with checksums disabled, set the field to
>> >> > InvalidXLogRecPtr. ?Whenever the cluster starts with checksums enabled and the
>> >> > field is InvalidXLogRecPtr, set the field to the next LSN. ?When a checksum
>> >> > failure occurs in a page with LSN older than the stored one, emit either a
>> >> > softer warning or no message at all.
>> >>
>> >> We can only change page_checksums at restart (now) so the above seems moot.
>> >>
>> >> If we crash with FPWs enabled we repair any torn pages.
>> >
>> > There's no live bug, but that comes at a high cost: the patch has us emit
>> > full-page images for hint bit writes regardless of the page_checksums setting.
>>
>> Sorry, I don't understand what you mean. I don't see any failure cases
>> that require that.
>>
>> page_checksums can only change at a shutdown checkpoint,
>>
>> The statement "If that write tears,
>> >> > leaving the checksum intact, that block will now fail verification."
>> cannot happen, ISTM.
>>
>> If we write out a block we update the checksum if page_checksums is
>> set, or we clear it. If we experience a torn page at crash, the FPI
>> corrects that, so the checksum never does fail verification. We only
>> need to write a FPI when we write with checksums.
>>
>> If that's wrong, please explain a failure case in detail.
>
> In the following description, I will treat a heap page as having two facts.
> The first is either CHKSUM or !CHKSUM, and the second is either HINT or !HINT.
> A torn page write lacking the protection of a full-page image can update one
> or both facts despite the transaction having logically updated both.  (This is
> just the normal torn-page scenario.)  Given that, consider the following
> sequence of events:
>
> 1. startup with page_checksums = on
> 2. update page P with facts CHKSUM, !HINT
> 3. clean write of P
> 4. clean shutdown
>
> 5. startup with page_checksums = off
> 6. update P with facts !CHKSUM, HINT.  no WAL since page_checksums=off
> 7. begin write of P
> 8. crash.  torn write of P only updates HINT.  disk state now CHKSUM, HINT
>
> 9. startup with page_checksums = off
> 10. crash recovery.  does not touch P, because step 6 generated no WAL
> 11. clean shutdown
>
> 12. startup with page_checksums = on
> 13. read P.  checksum failure
>
> Again, this cannot happen with checksum16_with_wallogged_hint_bits.v7.patch,
> because step 6 _does_ write WAL.  That ought to change for the sake of
> page_checksums=off performance, at which point we must protect the above
> scenario by other means.

Thanks for explaining.

Logic fixed.

>> >> > PageSetLSN() is not atomic, so the shared buffer content lock we'll be holding
>> >> > is insufficient.
>> >>
>> >> Am serialising this by only writing PageLSN while holding buf hdr lock.
>> >
>> > That means also taking the buffer header spinlock in every PageGetLSN() caller
>> > holding only a shared buffer content lock. ?Do you think that will pay off,
>> > versus the settled pattern of trading here your shared buffer content lock for
>> > an exclusive one?
>>
>> Yes, I think it will pay off. This is the only code location where we
>> do that, and we are already taking the buffer header lock, so there is
>> effectively no overhead.
>
> The sites I had in the mind are the calls to PageGetLSN() in FlushBuffer()
> (via BufferGetLSN()) and XLogCheckBuffer().  We don't take the buffer header
> spinlock at either of those locations.  If they remain safe under the new
> rules, we'll need comments explaining why.  I think they may indeed be safe,
> but it's far from obvious.

OK, some slight rework required to do that, no problems.

I checked all other call sites and have updated README to explain.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-09 15:17:59 Re: [COMMITTERS] pgsql: Add new keywords SNAPSHOT and TYPES to the keyword list in gram.
Previous Message Duncan Rance 2012-02-09 15:07:09 Re: BUG #6425: Bus error in slot_deform_tuple