From: | ncm(at)zembu(dot)com (Nathan Myers) |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: CRCs (was: beta testing version) |
Date: | 2000-12-07 20:25:41 |
Message-ID: | 20001207122541.A30335@store.zembu.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Wed, Dec 06, 2000 at 06:53:37PM -0600, Bruce Guenter wrote:
> On Wed, Dec 06, 2000 at 11:08:00AM -0800, Nathan Myers wrote:
> > On Wed, Dec 06, 2000 at 11:49:10AM -0600, Bruce Guenter wrote:
> > >
> > > I don't know how pgsql does it, but the only safe way I know of
> > > is to include an "end" marker after each record.
> >
> > An "end" marker is not sufficient, unless all writes are done in
> > one-sector units with an fsync between, and the drive buffering
> > is turned off.
>
> That's why an end marker must follow all valid records. When you write
> records, you don't touch the marker, and add an end marker to the end of
> the records you've written. After writing and syncing the records, you
> rewrite the end marker to indicate that the data following it is valid,
> and sync again. There is no state in that sequence in which partially-
> written data could be confused as real data, assuming either your drives
> aren't doing write-back caching or you have a UPS, and fsync doesn't
> return until the drives return success.
That requires an extra out-of-sequence write.
> > > Any other way I've seen discussed (here and elsewhere) either
> > > - Assume that a CRC is a guarantee.
> >
> > We are already assuming a CRC is a guarantee.
> >
> > The drive computes a CRC for each sector, and if the CRC is OK the
> > drive is happy. CRC errors within the drive are quite frequent, and
> > the drive re-reads when a bad CRC comes up.
>
> The kind of data failures that a CRC is guaranteed to catch (N-bit
> errors) are almost precisely those that a mis-read on a hardware sector
> would cause.
They catch a single mis-read, but not necessarily the quite likely
double mis-read.
> > > ... A CRC would be a good addition to
> > > help ensure the data wasn't broken by flakey drive firmware, but
> > > doesn't guarantee consistency.
> > No, a CRC would be a good addition to compensate for sector write
> > reordering, which is done both by the OS and by the drive, even for
> > "atomic" writes.
>
> But it doesn't guarantee consistency, even in that case. There is a
> possibility (however small) that the random data that was located in
> the sectors before the write will match the CRC.
Generally, there are no guarantees, only reasonable expectations. A
64-bit CRC would give sufficient confidence without the out-of-sequence
write, and also detect corruption from any source including power outage.
(I'd also like to see CRCs on all the table blocks as well; is there
a place to put them?)
Nathan Myers
ncm(at)zembu(dot)com
From | Date | Subject | |
---|---|---|---|
Next Message | Paul | 2000-12-07 20:46:21 | [HACKERS] Oracle-compatible lpad/rpad behavior |
Previous Message | Michael Miyabara-McCaskey | 2000-12-07 20:15:18 | Bug? Insert into new Datatype 7.0.x |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2000-12-07 20:28:42 | Re: beta testing version |
Previous Message | Mikheev, Vadim | 2000-12-07 20:22:12 | RE: CRCs (was: beta testing version) |