Re: Protecting against unexpected zero-pages: proposal

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Protecting against unexpected zero-pages: proposal
Date: 2010-11-07 05:04:27
Message-ID: AANLkTi=p_p2_QPbtHVVcUQzPk7LDwiWr7ixxxW81pTQz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 7, 2010 at 4:23 AM, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com> wrote:
> I understand that it is a pretty low-level change, but IMHO the change is
> minimal and is being applied in well understood places. All the assumptions
> listed have been effective for quite a while, and I don't see these
> assumptions being affected in the near future. Most crucial assumptions we
> have to work with are, that XLogPtr{n, 0xFFFFFFFF} will never be used, and
> that mdextend() is the only place that extends a relation (until we
> implement an md.c sibling, say flash.c or tape.c; the last change to md.c
> regarding mdextend() was in January 2007).

I think the assumption that isn't tested here is what happens if the
server crashes. The logic may work fine as long as nothing goes wrong
but if something does it has to be fool-proof.

I think having zero-filled blocks at the end of the file if it has
been extended but hasn't been fsynced is an expected failure mode of a
number of filesystems. The log replay can't assume seeing such a block
is a problem since that may be precisely the result of the crash that
caused the replay. And if you disable checking for this during WAL
replay then you've lost your main chance to actually detect the
problem.

Another issue -- though I think a manageable one -- is that I expect
we'll want to be be using posix_fallocate() sometime soon. That will
allow efficient guaranteed pre-allocated space with better contiguous
layout than currently. But ext4 can only pretend to give zero-filled
blocks, not any random bitpattern we request. I can see this being an
optional feature that is just not compatible with using
posix_fallocate() though.

It does seem like this is kind of part and parcel of adding checksums
to blocks. It's arguably kind of silly to add checksums to blocks but
have an commonly produced bitpattern in corruption cases go
undetected.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2010-11-07 11:42:12 Re: ALTER TABLE ... IF EXISTS feature?
Previous Message Gurjeet Singh 2010-11-07 04:23:15 Re: Protecting against unexpected zero-pages: proposal