Re: CRCs (was: beta testing version)

From: ncm(at)zembu(dot)com (Nathan Myers)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: CRCs (was: beta testing version)
Date: 2000-12-07 20:25:41
Message-ID: 20001207122541.A30335@store.zembu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Wed, Dec 06, 2000 at 06:53:37PM -0600, Bruce Guenter wrote:
> On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote:
> > On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
> > >
> > > I don't know how pgsql does it, but the only safe way I know of
> > > is to include an "end" marker after each record.
> >
> > An "end" marker is not sufficient, unless all writes are done in
> > one-sector units with an fsync between, and the drive buffering
> > is turned off.
>
> That's why an end marker must follow all valid records. When you write
> records, you don't touch the marker, and add an end marker to the end of
> the records you've written. After writing and syncing the records, you
> rewrite the end marker to indicate that the data following it is valid,
> and sync again. There is no state in that sequence in which partially-
> written data could be confused as real data, assuming either your drives
> aren't doing write-back caching or you have a UPS, and fsync doesn't
> return until the drives return success.

That requires an extra out-of-sequence write.

> > > Any other way I've seen discussed (here and elsewhere) either
> > > - Assume that a CRC is a guarantee.
> >
> > We are already assuming a CRC is a guarantee.
> >
> > The drive computes a CRC for each sector, and if the CRC is OK the
> > drive is happy. CRC errors within the drive are quite frequent, and
> > the drive re-reads when a bad CRC comes up.
>
> The kind of data failures that a CRC is guaranteed to catch (N-bit
> errors) are almost precisely those that a mis-read on a hardware sector
> would cause.

They catch a single mis-read, but not necessarily the quite likely
double mis-read.

> > > ... A CRC would be a good addition to
> > > help ensure the data wasn't broken by flakey drive firmware, but
> > > doesn't guarantee consistency.
> > No, a CRC would be a good addition to compensate for sector write
> > reordering, which is done both by the OS and by the drive, even for
> > "atomic" writes.
>
> But it doesn't guarantee consistency, even in that case. There is a
> possibility (however small) that the random data that was located in
> the sectors before the write will match the CRC.

Generally, there are no guarantees, only reasonable expectations. A
64-bit CRC would give sufficient confidence without the out-of-sequence
write, and also detect corruption from any source including power outage.

(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)

Nathan Myers
ncm(at)zembu(dot)com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Paul 2000-12-07 20:46:21 [HACKERS] Oracle-compatible lpad/rpad behavior
Previous Message Michael Miyabara-McCaskey 2000-12-07 20:15:18 Bug? Insert into new Datatype 7.0.x

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-12-07 20:28:42 Re: beta testing version
Previous Message Mikheev, Vadim 2000-12-07 20:22:12 RE: CRCs (was: beta testing version)