Re: New CRC algorithm: Slicing by 8

From: mark(at)mark(dot)mielke(dot)cc
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: New CRC algorithm: Slicing by 8
Date: 2006-10-24 14:18:16
Message-ID: 20061024141816.GA9658@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 24, 2006 at 02:52:36PM +0100, Simon Riggs wrote:
> On Tue, 2006-10-24 at 09:37 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > > On Mon, 2006-10-23 at 15:12 -0400, Tom Lane wrote:
> > >> There are actually three checks used to detect end of WAL: zero record
> > >> length, invalid checksum, and incorrect back-pointer. Zero length is
> > >> the first and cleanest-looking test, but AFAICS we have to have both of
> > >> the others to avoid obvious failure modes.
> > > The checksum protects from torn pages and disk errors. If you have
> > > full_page_writes set then you already believe yourself safe from torn
> > > pages and your disks could also already be CRC-checking the data.
> > No, because unlike tuples, WAL records can and do cross page boundaries.

> But not that often, with full_page_writes = off. So we could get away
> with just CRC checking the page-spanning ones and mark the records to
> show whether they have been CRC checked or not and need to be rechecked
> at recovery time. That would reduce the CRC overhead to about 1-5% of
> what it is now (as an option).

WAL pages 8 Kbytes, and disk pages 512 bytes, correct? I don't see a
guarantee in here that the 8 Kbytes worth of data will be written as
sequential writes, nor that the 8 Kbytes of data will necessarily
finish.

If the operating system uses 8 Kbyte pages, or the RAID system uses 8
Kbytes or larger chunks, and they guarantee sequential writes, perhaps
it is ok. Still, if the power goes out after writing the first 512
bytes, 2048 bytes, or 4096 bytes, then what? With RAID involved it
might get better or worse, depending on the RAID configuration.

I'm almost wondering whether the three numbers are enough. I'm too busy
to sketch it all down and predict failure points... :-)

> Just a thought: Would there be benefit in not allowing page-spanning WAL
> records, when they are small? That would tend to reduce the number of
> WAL writes, even if it did cause some space wastage on disk. That would
> reduce the number of same block re-writes and might improve the
> sequential behaviour of WAL access. We never needed to think about this
> while full_page_writes=on was the only option.

Probably. Might not be much though.

> > > CRC-checked disks are actually the industry norm and have been for
> > > around 5 years.
> > Huh? Disks have ALWAYS had CRCs, and this is in any case utterly
> > irrelevant to the database-crash risk.
> According to the man from Seagate...

Once upon a time when bits were stored the size of smarties (exaggeration)....
Back in those days (before my time), they liked to use things like parity.

I didn't read the page, but perhaps there is some confusion between
CRC and error correction codes. Obviously, technologies were
introduced over time. I don't remember ever having a hard disk that
didn't have some sort of error detection. The 5 year claim seems
decades too short unless they are talking about a newer technology.
Even the old 5.25" DOS floppies seemed to be able to detect errors
rather than return invalid corrupted bits.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-10-24 14:34:57 Re: Release stamping (Was: [CORE] Schedule for release?)
Previous Message Tom Lane 2006-10-24 14:11:43 Re: WAL logging freezing