Re: 16-bit page checksums for 9.2

From: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, simon(at)2ndquadrant(dot)com, heikki(dot)linnakangas(at)enterprisedb(dot)com, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 11:29:34
Message-ID: CA+CSw_vyuqdLNjFPh=wUF_ngrOAdw2D+kvVZ_L5BUCA2hgzwdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 4, 2012 at 3:49 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> On 12/29/11, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:
>>> Unless I'm missing something, double-writes are needed for all writes,
>>> not only the first page after a checkpoint. Consider this sequence of
>>> events:
>>>
>>> 1. Checkpoint
>>> 2. Double-write of page A (DW buffer write, sync, heap write)
>>> 3. Sync of heap, releasing DW buffer for new writes.
>>>  ... some time goes by
>>> 4. Regular write of page A
>>> 5. OS writes one part of page A
>>> 6. Crash!
>>>
>>> Now recovery comes along, page A is broken in the heap with no
>>> double-write buffer backup nor anything to recover it by in the WAL.
>>
>> Isn't 3 the very definition of a checkpoint, meaning that 4 is not
>> really a regular write as it is the first one after a checkpoint?
>
> I think you nailed it.

No, I should have explicitly stated that no checkpoint happens in
between. I think the confusion here is because I assumed Kevin
described a fixed size d-w buffer in this message:

On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> ...  The file is fsync'd (like I said,
> hopefully to BBU cache), then each page in the double-write buffer is
> written to the normal page location, and that is fsync'd.  Once that
> is done, the database writes have no risk of being torn, and the
> double-write buffer is marked as empty.  ...

If the double-write buffer survives until the next checkpoint,
double-writing only the first write should work just fine. The
advantage over current full-page writes is that the write is not into
the WAL stream and is done (hopefully) by the bgwriter/checkpointer in
the background.

--
Ants Aasma

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nicolas Barbier 2012-01-04 11:53:04 Re: 16-bit page checksums for 9.2
Previous Message Simon Riggs 2012-01-04 11:14:29 Re: Should I implement DROP INDEX CONCURRENTLY?