Re: 16-bit page checksums for 9.2

From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, simon(at)2ndquadrant(dot)com, heikki(dot)linnakangas(at)enterprisedb(dot)com, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-29 23:42:51
Message-ID: CAP-rdTZGLBAsrq1aLO2EytLoUk6Vx3VzaMkjLQeYHLPpBoeQ5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/12/30 Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>:

> On Thu, Dec 29, 2011 at 6:44 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> positives.  To get this right for a checksum in the page header,
>> double-write would need to be used for all cases where
>> full_page_writes now are used (i.e., the first write of a page after
>> a checkpoint), and for all unlogged writes (e.g., hint-bit-only
>> writes).  There would be no correctness problem for always using
>> double-write, but it would be unnecessary overhead for other page
>> writes, which I think we can avoid.
>
> Unless I'm missing something, double-writes are needed for all writes,
> not only the first page after a checkpoint. Consider this sequence of
> events:
>
> 1. Checkpoint
> 2. Double-write of page A (DW buffer write, sync, heap write)
> 3. Sync of heap, releasing DW buffer for new writes.
>  ... some time goes by
> 4. Regular write of page A
> 5. OS writes one part of page A
> 6. Crash!
>
> Now recovery comes along, page A is broken in the heap with no
> double-write buffer backup nor anything to recover it by in the WAL.

I guess the assumption is that the write in (4) is either backed by
the WAL, or made safe by double writing. ISTM that such reasoning is
only correct if the change that is expressed by the WAL record can be
applied in the context of inconsistent (i.e., partially written)
pages, which I assume is not the case (excuse my ignorance regarding
such basic facts).

So I think you are right.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2011-12-30 00:21:13 Re: failed regress test
Previous Message Jean-Yves F. Barbier 2011-12-29 23:33:46 Re: index refuses to build