PITR Phase 1 - Test results

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PITR Phase 1 - Test results
Date: 2004-04-26 15:37:27
Message-ID: 1082991844.3999.60.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've now completed the coding of Phase 1 of PITR.

This allows a backup to be recovered and then rolled forward (all the
way) on transaction logs. This proves the code and the design works, but
also validates a lot of the earlier assumptions that were the subject of
much earlier debate.

As noted in the previous designs, PostgreSQL talks to an external
archiver using the XLogArchive API.
I've now completed:
- changes to PostgreSQL
- written a simple archiving utility, pg_arch

Using both of these together, I have successfully:
- started pg_arch
- started postgres
- taken a backup using tar
- ran pgbench for an extended period, so that the transaction logs taken
at the start have long since been recycled
- killed postmaster
- wait for completion
- rm -R $PGDATA
- restore using tar
- restore xlogs from archive directory
- start postmaster and watch it recover to end of logs

This has been tested through a number of times on non-trivial tests and
I've sat and watch the beast at work to make sure nothing wierd was
happening on timing.

At this stage:
Missing Functions -
- recovery does NOT yet stop at a specified point-in-time (that was
always planned for Phase 2)
- few more log messages required to report progress
- debug mode required to allow most to be turned off

Wrinkles
- code is system testable, but not as cute as it could be
- input from committers is now sought to complete the work
- you are strongly advised not to treat any of the patches as usable in
any real world situation YET - that bit comes next

Bugs
- two bugs currently occur during some tests:
1. the notification mechanism as originally designed causes ALL backends
to report that a log file has closed. That works most of the time,
though does give rise to occaisional timing errors - nothing too
serious, but this inexactness could lead to later errors.
2. After restore, the notification system doesn't recover fully - this
is a straightforward one

I'm building a full patchset for this code and will upload this soon. As
you might expect over the time its taken me to develop this, some bitrot
has set in, so I'm rebuilding it against the latest dev version now, and
will complete fixes for the two bugs mentioned above.

I'm sure some will say "no words, show me the code"... I thought you all
would appreciate some advance warning of this, to plan time to
investigate and comment upon the coding.

Best Regards, Simon Riggs, 2ndQuadrant
http://www.2ndquadrant.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2004-04-26 15:37:37 Re: FW: getting a crash during initdb
Previous Message Alvaro Herrera 2004-04-26 15:27:12 Thread code not vpath-safe