Re: Cost of XLogInsert CRC calculations

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>, "'Manfred Koizar'" <mkoi-pg(at)aon(dot)at>, "'Greg Stark'" <gsstark(at)mit(dot)edu>, "'Bruce Momjian'" <pgman(at)candle(dot)pha(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Cost of XLogInsert CRC calculations
Date: 2005-05-31 14:53:08
Message-ID: 878y1vwip7.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> It's not really a matter of backstopping the hardware's error detection;
> if we were trying to do that, we'd keep a CRC for every data page, which
> we don't. The real reason for the WAL CRCs is as a reliable method of
> identifying the end of WAL: when the "next record" doesn't checksum you
> know it's bogus. This is a nontrivial point because of the way that we
> re-use WAL files --- the pages beyond the last successfully written page
> aren't going to be zeroes, they'll be filled with random WAL data.

Is the random WAL data really the concern? It seems like a more reliable way
of dealing with that would be to just accompany every WAL entry with a
sequential id and stop when the next id isn't the correct one.

I thought the problem was that if the machine crashed in the middle of writing
a WAL entry you wanted to be sure to detect that. And there's no guarantee the
fsync will write out the WAL entry in order. So it's possible the end (and
beginning) of the WAL entry will be there but the middle still have been
unwritten.

The only truly reliable way to handle this would require two fsyncs per
transaction commit which would be really unfortunate.

> Personally I think CRC32 is plenty for this job, but there were those
> arguing loudly for CRC64 back when we made the decision originally ...

So given the frequency of database crashes and WAL replays if having one
failed replay every few million years is acceptable I think 32 bits is more
than enough. Frankly I think 16 bits would be enough.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2005-05-31 14:54:25 Re: Backslash handling in strings
Previous Message Tom Lane 2005-05-31 14:44:58 Re: A 2 phase commit weirdness