Re: WAL's single point of failure: latest CHECKPOINT record

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Justin Clift <aa2(at)bigpond(dot)net(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ned Lilly <ned(at)greatbridge(dot)com>
Subject: Re: WAL's single point of failure: latest CHECKPOINT record
Date: 2001-03-02 00:22:56
Message-ID: 200103020022.TAA00652@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We really need point-in-time recovery, removal of the need to vacuum,
and more full-featured replication. Hopefully most can be addressed in
7.2.

> Hi all,
>
> Out of curiosity, does anyone know of any projects that are presently
> creating PostgreSQL database recovery tools?
>
> For example database corruption recovery, Point In Time restoration, and
> such things?
>
> It might be a good project for GreatBridge to look into if no-one else
> is doing it already.
>
> Regards and best wishes,
>
> Justin Clift
> Database Administrator
>
> Tom Lane wrote:
> >
> > As the WAL stuff is currently constructed, the system will refuse to
> > start up unless the checkPoint field of pg_control points at a valid
> > checkpoint record in the WAL log.
> >
> > Now I know we write and fsync the checkpoint record before we rewrite
> > pg_control, but this still leaves me feeling mighty uncomfortable.
> > See past discussions about how fsync order doesn't necessarily mean
> > anything if the disk drive chooses to reorder writes. Since loss of
> > the checkpoint record means complete loss of the database, I think we
> > need to work harder here.
> >
> > What I'm thinking is that pg_control should have pointers to the last
> > two checkpoint records, not only the last one. If we fail to read the
> > most recent checkpoint, try the one before it (which, obviously, means
> > we must keep the log files long enough that we still have that one too).
> > We can run forward from there and redo the intervening WAL records the
> > same as we would do anyway.
> >
> > This would mean an initdb to change the format of pg_control. However
> > I already have a couple other reasons in favor of an initdb: the
> > record-length bug I mentioned yesterday, and the bogus CRC algorithm.
> > I'm not finished reviewing the WAL code, either :-(
> >
> > regards, tom lane
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ned Lilly 2001-03-02 02:03:27 7.2 tools (was: WAL's single point of failure: latest CHECKPOINT record)
Previous Message Justin Clift 2001-03-01 23:56:58 Re: WAL's single point of failure: latest CHECKPOINT record