Re: fault tolerance...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Quinn <cq(at)htec(dot)demon(dot)co(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: fault tolerance...
Date: 2002-03-19 15:39:57
Message-ID: 29207.1016552397@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christopher Quinn <cq(at)htec(dot)demon(dot)co(dot)uk> writes:
> I've been wondering how pgsql goes about guaranteeing data
> integrity in the face of soft failures. In particular
> whether it uses an alternative to the double root block
> technique - which is writing, as a final indication of the
> validity of new log records, to alternate disk blocks at
> fixed disk locations some meta information including the
> location of the last log record written.
> This is the only technique I know of - does pgsql use
> something analogous?

The WAL log uses per-record CRCs plus sequence numbers (both per-record
and per-page) as a way of determining where valid information stops.
I don't see any need for relying on a "root block" in the sense you
describe.

> Lastly, is there any form of integrity checking on disk
> block level data? I have vague recollections of seeing
> mention of crc/xor in relation to Oracle or DB2.

At present we rely on the disk drive to not drop data once it's been
successfully fsync'd (at least not without detecting a read error later).
There was some discussion of adding per-page CRCs as a second-layer
check, but no one seems very excited about it. The performance costs
would be nontrivial and we have not seen all that many reports of field
failures in which a CRC would have improved matters.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Lockhart 2002-03-19 15:46:07 Re: [HACKERS] Fixes gram.y
Previous Message mlw 2002-03-19 15:39:53 Re: Platform comparison ...