Quick Links

Re: 16-bit page checksums for 9.2

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Noah Misch <noah(at)leadboat(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: 16-bit page checksums for 9.2
Date:	2012-02-16 11:16:31
Message-ID:	CA+U5nMJ-Sc5rrSQqu5ed1sbUbM8Zr-BcZYwhqazSB3q99aFOmw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Feb 9, 2012 at 3:16 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Feb 8, 2012 at 2:05 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> On Wed, Feb 08, 2012 at 11:40:34AM +0000, Simon Riggs wrote:
>>> On Wed, Feb 8, 2012 at 3:24 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>>> > On Tue, Feb 07, 2012 at 08:58:59PM +0000, Simon Riggs wrote:
>>> >> On Thu, Jan 26, 2012 at 8:20 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>>> >> > This patch uses FPIs to guard against torn hint writes, even when the
>>> >> > checksums are disabled. ?One could not simply condition them on the
>>> >> > page_checksums setting, though. ?Suppose page_checksums=off and we're hinting
>>> >> > a page previously written with page_checksums=on. ?If that write tears,
>>> >> > leaving the checksum intact, that block will now fail verification. ?A couple
>>> >> > of ideas come to mind. ?(a) Read the on-disk page and emit an FPI only if the
>>> >> > old page had a checksum. ?(b) Add a checksumEnableLSN field to pg_control.
>>> >> > Whenever the cluster starts with checksums disabled, set the field to
>>> >> > InvalidXLogRecPtr. ?Whenever the cluster starts with checksums enabled and the
>>> >> > field is InvalidXLogRecPtr, set the field to the next LSN. ?When a checksum
>>> >> > failure occurs in a page with LSN older than the stored one, emit either a
>>> >> > softer warning or no message at all.
>>> >>
>>> >> We can only change page_checksums at restart (now) so the above seems moot.
>>> >>
>>> >> If we crash with FPWs enabled we repair any torn pages.
>>> >
>>> > There's no live bug, but that comes at a high cost: the patch has us emit
>>> > full-page images for hint bit writes regardless of the page_checksums setting.
>>>
>>> Sorry, I don't understand what you mean. I don't see any failure cases
>>> that require that.
>>>
>>> page_checksums can only change at a shutdown checkpoint,
>>>
>>> The statement "If that write tears,
>>> >> > leaving the checksum intact, that block will now fail verification."
>>> cannot happen, ISTM.
>>>
>>> If we write out a block we update the checksum if page_checksums is
>>> set, or we clear it. If we experience a torn page at crash, the FPI
>>> corrects that, so the checksum never does fail verification. We only
>>> need to write a FPI when we write with checksums.
>>>
>>> If that's wrong, please explain a failure case in detail.
>>
>> In the following description, I will treat a heap page as having two facts.
>> The first is either CHKSUM or !CHKSUM, and the second is either HINT or !HINT.
>> A torn page write lacking the protection of a full-page image can update one
>> or both facts despite the transaction having logically updated both. (This is
>> just the normal torn-page scenario.) Given that, consider the following
>> sequence of events:
>>
>> 1. startup with page_checksums = on
>> 2. update page P with facts CHKSUM, !HINT
>> 3. clean write of P
>> 4. clean shutdown
>>
>> 5. startup with page_checksums = off
>> 6. update P with facts !CHKSUM, HINT. no WAL since page_checksums=off
>> 7. begin write of P
>> 8. crash. torn write of P only updates HINT. disk state now CHKSUM, HINT
>>
>> 9. startup with page_checksums = off
>> 10. crash recovery. does not touch P, because step 6 generated no WAL
>> 11. clean shutdown
>>
>> 12. startup with page_checksums = on
>> 13. read P. checksum failure
>>
>> Again, this cannot happen with checksum16_with_wallogged_hint_bits.v7.patch,
>> because step 6 _does_ write WAL. That ought to change for the sake of
>> page_checksums=off performance, at which point we must protect the above
>> scenario by other means.
>
> Thanks for explaining.
>
> Logic fixed.
>
>>> >> > PageSetLSN() is not atomic, so the shared buffer content lock we'll be holding
>>> >> > is insufficient.
>>> >>
>>> >> Am serialising this by only writing PageLSN while holding buf hdr lock.
>>> >
>>> > That means also taking the buffer header spinlock in every PageGetLSN() caller
>>> > holding only a shared buffer content lock. ?Do you think that will pay off,
>>> > versus the settled pattern of trading here your shared buffer content lock for
>>> > an exclusive one?
>>>
>>> Yes, I think it will pay off. This is the only code location where we
>>> do that, and we are already taking the buffer header lock, so there is
>>> effectively no overhead.
>>
>> The sites I had in the mind are the calls to PageGetLSN() in FlushBuffer()
>> (via BufferGetLSN()) and XLogCheckBuffer(). We don't take the buffer header
>> spinlock at either of those locations. If they remain safe under the new
>> rules, we'll need comments explaining why. I think they may indeed be safe,
>> but it's far from obvious.
>
> OK, some slight rework required to do that, no problems.
>
> I checked all other call sites and have updated README to explain.

v8 attached

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment	Content-Type	Size
checksum16_with_wallogged_hint_bits.v8c.patch	text/x-diff	69.7 KB

In response to

Re: 16-bit page checksums for 9.2 at 2012-02-09 15:16:04 from Simon Riggs

Responses

Re: 16-bit page checksums for 9.2 at 2012-02-16 11:48:13 from Albert Cervera i Areny
Re: 16-bit page checksums for 9.2 at 2012-02-16 13:53:48 from Robert Haas
Re: 16-bit page checksums for 9.2 at 2012-02-19 21:49:48 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2012-02-16 11:31:04	Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Previous Message	Fujii Masao	2012-02-16 09:15:42	Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)