fast stop before database system is ready

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: fast stop before database system is ready
Date: 2007-06-22 21:25:37
Message-ID: 467BF7FF.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I apologize for not grabbing more information before the evidence was gone,
but I think there may be a vulnerability to database corruption on PITR
recovery if a stop is done with the "fast" option right after a database logs
"archive recovery complete". We normally have about 17 seconds between that
and the "database system is ready" message for a particular database.
Someone was watching the log and issued a fast stop about 1.5 seconds after
the "archive recovery is complete" message. When the database came back up,
it was corrupted. (The first problem message was about a bad sibling
pointer, but the wheels pretty much fell off after that.) He deleted the
database instance, got a fresh dump, and tried again without stopping the
server at that point, and all is well.

The dump used in the problem recovery attempt is now gone. I hesitate to
report this since my information is so sketchy, but thought you might want
the report anyway.

The source and target of this PITR-style copy were both PostgreSQL 8.2.4 on
SuSE Linux. For more details on the target see my recent posts about the
corrupted database which turned out to be caused by bad hardware and outdated
drivers.
( http://archives.postgresql.org/pgsql-admin/2007-06/msg00151.php )
The failed recovery was on that box, after fixing all known hardware and
driver issues.

No assistance needed.

-Kevin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-06-22 22:46:12 Re: AutoVacuum Behaviour Question
Previous Message Jim Nasby 2007-06-22 20:57:21 Re: Bugtraq: Having Fun With PostgreSQL