Re: 16-bit page checksums for 9.2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, simon(at)2ndquadrant(dot)com, heikki(dot)linnakangas(at)enterprisedb(dot)com, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-01-04 01:49:42
Message-ID: CA+TgmoY+r-EVs3zskY5_wE_EXxs9yvG-0im531==UM_PHuCbmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 30, 2011 at 11:58 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On 12/29/11, Ants Aasma <ants(dot)aasma(at)eesti(dot)ee> wrote:
>> Unless I'm missing something, double-writes are needed for all writes,
>> not only the first page after a checkpoint. Consider this sequence of
>> events:
>>
>> 1. Checkpoint
>> 2. Double-write of page A (DW buffer write, sync, heap write)
>> 3. Sync of heap, releasing DW buffer for new writes.
>>  ... some time goes by
>> 4. Regular write of page A
>> 5. OS writes one part of page A
>> 6. Crash!
>>
>> Now recovery comes along, page A is broken in the heap with no
>> double-write buffer backup nor anything to recover it by in the WAL.
>
> Isn't 3 the very definition of a checkpoint, meaning that 4 is not
> really a regular write as it is the first one after a checkpoint?

I think you nailed it.

> But it doesn't seem safe to me replace a page from the DW buffer and
> then apply WAL to that replaced page which preceded the age of the
> page in the buffer.

That's what LSNs are for.

If we write the page to the checkpoint buffer just once per
checkpoint, recovery can restore the double-written versions of the
pages and then begin WAL replay, which will restore all the subsequent
changes made to the page. Recovery may also need to do additional
double-writes if it encounters pages that for which we wrote WAL but
never flushed the buffer, because a crash during recovery can also
create torn pages. When we reach a restartpoint, we fsync everything
down to disk and then nuke the double-write buffer. Similarly, in
normal running, we can nuke the double-write buffer at checkpoint
time, once the fsyncs are complete.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-01-04 01:52:49 Re: Setting -Werror in CFLAGS
Previous Message Robert Haas 2012-01-04 01:40:15 Re: Add SPI results constants available for PL/*