Skip site navigation (1) Skip section navigation (2)

WAL's single point of failure: latest CHECKPOINT record

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Vadim Mikheev <vadim4o(at)email(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: WAL's single point of failure: latest CHECKPOINT record
Date: 2001-03-01 17:24:21
Message-ID: 21559.983467461@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
As the WAL stuff is currently constructed, the system will refuse to
start up unless the checkPoint field of pg_control points at a valid
checkpoint record in the WAL log.

Now I know we write and fsync the checkpoint record before we rewrite
pg_control, but this still leaves me feeling mighty uncomfortable.
See past discussions about how fsync order doesn't necessarily mean
anything if the disk drive chooses to reorder writes.  Since loss of
the checkpoint record means complete loss of the database, I think we
need to work harder here.

What I'm thinking is that pg_control should have pointers to the last
two checkpoint records, not only the last one.  If we fail to read the
most recent checkpoint, try the one before it (which, obviously, means
we must keep the log files long enough that we still have that one too).
We can run forward from there and redo the intervening WAL records the
same as we would do anyway.

This would mean an initdb to change the format of pg_control.  However
I already have a couple other reasons in favor of an initdb: the
record-length bug I mentioned yesterday, and the bogus CRC algorithm.
I'm not finished reviewing the WAL code, either :-(

			regards, tom lane

Responses

pgsql-hackers by date

Next:From: Peter MountDate: 2001-03-01 20:06:56
Subject: Re: jdbc driver hack
Previous:From: Tom LaneDate: 2001-03-01 15:35:29
Subject: Re: AW: Uh, this is *not* a 64-bit CRC ...

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group