Re: Page Checksums

From: Greg Stark <stark(at)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: greg(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Page Checksums
Date: 2011-12-25 22:18:51
Message-ID: CAM-w4HPUJC1mr1XxRBcykHVh8nSW9dWdK3LmQ=NrpNmeTQSF9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Dec 19, 2011 at 7:16 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> It seems to me that on a typical production system you would
> probably have zero or one such page per OS crash

Incidentally I don't think this is right. There are really two kinds
of torn pages:

1) The kernel vm has many dirty 4k pages and decides to flush one 4k
page of a Postgres 8k buffer but not the other one. It doesn't sound
very logical for it to do this but it has the same kind of tradeoffs
to make that Postgres does and there could easily be cases where the
extra book-keeping required to avoid it isn't deemed worthwhile. The
two memory pages might not even land on the same part of the disk
anyways so flushing one and not the other might be reasonable.

In this case there could be an unbounded number of such torn pages and
they can stay torn on disk for a long period of time so the torn pages
may not have been actively being written when the crash occurred. On
Linux these torn pages will always be on memory page boundaries -- ie
4k blocks on x86.

2) The i/o system was in the process of writing out blocks and the
system lost power or crashed as they were being written out. In this
case there will probably only be 0 or 1 torn pages -- perhaps as many
as the scsi queue depth if there's some weird i/o scheduling going on.
In this case the torn page could be on a hardware block boundary --
often 512 byte boundaries (or if the drives don't guarantee otherwise
it could corrupt a disk block).

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-12-26 10:14:41 Re: Standalone synchronous master
Previous Message Alexander Björnhagen 2011-12-25 20:08:40 Standalone synchronous master