Re: 16-bit page checksums for 9.2

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <ants(dot)aasma(at)eesti(dot)ee>,<nicolas(dot)barbier(at)gmail(dot)com>
Cc: <simon(at)2ndquadrant(dot)com>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>, <aidan(at)highrise(dot)ca>, <stark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 16-bit page checksums for 9.2
Date: 2011-12-30 14:18:43
Message-ID: 4EFD73E30200002500044232@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Nicolas Barbier wrote:
> 2011/12/30 Ants Aasma :
>> Kevin Grittner wrote:
>>
>>> positives. To get this right for a checksum in the page header,
>>> double-write would need to be used for all cases where
>>> full_page_writes now are used (i.e., the first write of a page
>>> after a checkpoint), and for all unlogged writes (e.g.,
>>> hint-bit-only writes). There would be no correctness problem for
>>> always using double-write, but it would be unnecessary overhead
>>> for other page writes, which I think we can avoid.
>>
>> Unless I'm missing something, double-writes are needed for all
>> writes, not only the first page after a checkpoint. Consider this
>> sequence of events:
>>
>> 1. Checkpoint
>> 2. Double-write of page A (DW buffer write, sync, heap write)
>> 3. Sync of heap, releasing DW buffer for new writes.
>> ... some time goes by
>> 4. Regular write of page A
>> 5. OS writes one part of page A
>> 6. Crash!
>>
>> Now recovery comes along, page A is broken in the heap with no
>> double-write buffer backup nor anything to recover it by in the
>> WAL.
>
> I guess the assumption is that the write in (4) is either backed by
> the WAL, or made safe by double writing. ISTM that such reasoning
> is only correct if the change that is expressed by the WAL record
> can be applied in the context of inconsistent (i.e., partially
> written) pages, which I assume is not the case (excuse my ignorance
> regarding such basic facts).
>
> So I think you are right.

Hmm. It appears that I didn't think that through all the way. I see
two alternatives.

(1) We don't eliminate full_page_writes and we only need to use
double-writes for unlogged writes.

(2) We double-write all writes and on recovery we only apply WAL to
a page from pd_lsn onward. We would start from the same point and
follow the same rules except that when we read a page and find a
pd_lsn past the location of the record we are applying, we do nothing
because we are 100% sure everything to that point is safely written
and not torn. full_page_writes to WAL would not be needed.

-Kevin

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-12-30 14:33:15 Re: 16-bit page checksums for 9.2
Previous Message Simon Riggs 2011-12-30 12:15:02 Re: 16-bit page checksums for 9.2