Re: New CRC algorithm: Slicing by 8

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Gregory Stark" <gsstark(at)mit(dot)edu>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gregory Maxwell" <gmaxwell(at)gmail(dot)com>, <mark(at)mark(dot)mielke(dot)cc>, "Gurjeet Singh" <singh(dot)gurjeet(at)gmail(dot)com>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New CRC algorithm: Slicing by 8
Date: 2006-10-26 13:48:25
Message-ID: 87lkn3yyqe.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

> I've looked into this in more depth following your suggestion: I think
> it seems straightforward to move the xl_prev field from being a header
> to a trailer. That way when we do the test on the back pointer we will
> be assured that there is no torn page effecting the remainder of the
> xlrec. That would make it safer with wal_checksum = off.

Hm. I think in practice this may actually help reduce the exposure to torn
pages. However in theory there's no particular reason to think the blocks will
be written out in physical order.

The kernel may sync its buffers in some order dictated by its in-memory data
structure and may end up coming across the second half of the 8kb page before
the first half. It may even lie earlier on disk than the first half if the
filesystem started a new extent at that point.

If they were 4kb pages there would be fewer ways it could be written out of
order, but even then the hard drive could find a bad block and remap it. I'm
not sure what level of granularity drives remap at, it may be less than 4kb.

To eliminate the need for the CRC in the WAL for everyone and still be safe
from torn pages I think you have to have something like xl_prev repeated every
512b throughout the page.

But if this is only an option for systems that don't expect to suffer from
torn pages then sure, putting it in a footer seems like a good way to reduce
the exposure somewhat. Putting it in both a header *and* a footer might be
even better.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Zdenek Kotala 2006-10-26 14:11:27 Re: [HACKERS] COPY does not work with regproc and aclitem
Previous Message Alvaro Herrera 2006-10-26 13:44:07 Re: [HACKERS] COPY does not work with regproc and aclitem