Re: fast stop before database system is ready

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fast stop before database system is ready
Date: 2007-06-23 20:09:36
Message-ID: 1182629376.9276.350.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2007-06-22 at 16:25 -0500, Kevin Grittner wrote:
> I apologize for not grabbing more information before the evidence was gone,
> but I think there may be a vulnerability to database corruption on PITR
> recovery if a stop is done with the "fast" option right after a database logs
> "archive recovery complete". We normally have about 17 seconds between that
> and the "database system is ready" message for a particular database.
> Someone was watching the log and issued a fast stop about 1.5 seconds after
> the "archive recovery is complete" message. When the database came back up,
> it was corrupted. (The first problem message was about a bad sibling
> pointer, but the wheels pretty much fell off after that.) He deleted the
> database instance, got a fresh dump, and tried again without stopping the
> server at that point, and all is well.

The message is issued too early. It should be issued after the shutdown
checkpoint that occurs following recovery. However, that only matters if
you issue a shutdown immediate and would not occur with fast shutdown.

So I suspect your DBA has Oracle training and thinks that -m immediate
is a fast shutdown. It certainly is fast, but not the same thing as
-m fast.

For -m fast, this should issue a normal shutdown checkpoint, which will
secure the backup. If you do -m immediate, this means "crash the server,
please", which we interrupt the shutdown checkpoint issued at the end of
recovery, leading to an incomplete archive recovery.

I'll re-arrange the message until after the checkpoint occurs, so that
the timing window for -m immediate is removed.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2007-06-23 20:25:06 Re: Worries about delayed-commit semantics
Previous Message Oleg Bartunov 2007-06-23 19:11:42 Re: tsearch in core patch