Re: 9.2.3 crashes during archive recovery

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 9.2.3 crashes during archive recovery
Date: 2013-02-13 20:52:36
Message-ID: CA+U5nM+z+ngGL49vaQrW=_bzg+Gi32rY+pQbinjqHprP+cAW6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13 February 2013 09:04, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> Without step 3, the server would perform crash recovery, and it would work.
> But because of the recovery.conf file, the server goes into archive
> recovery, and because minRecoveryPoint is not set, it assumes that the
> system is consistent from the start.
>
> Aside from the immediate issue with truncation, the system really isn't
> consistent until the WAL has been replayed far enough, so it shouldn't open
> for hot standby queries. There might be other, later, changes already
> flushed to data files. The system has no way of knowing how far it needs to
> replay the WAL to become consistent.
>
> At least in back-branches, I'd call this a pilot error. You can't turn a
> master into a standby just by creating a recovery.conf file. At least not if
> the master was not shut down cleanly first.
>
> If there's a use case for doing that, maybe we can do something better in
> HEAD. If the control file says that the system was running
> (DB_IN_PRODUCTION), but there is a recovery.conf file, we could do crash
> recovery first, until we reach the end of WAL, and go into archive recovery
> mode after that. We'd recover all the WAL files in pg_xlog as far as we can,
> same as in crash recovery, and only start restoring files from the archive
> once we reach the end of WAL in pg_xlog. At that point, we'd also consider
> the system as consistent, and start up for hot standby.
>
> I'm not sure that's worth the trouble, though. Perhaps it would be better to
> just throw an error if the control file state is DB_IN_PRODUCTION and a
> recovery.conf file exists. The admin can always start the server normally
> first, shut it down cleanly, and then create the recovery.conf file.

Now I've read the whole thing...

The problem is that we startup Hot Standby before we hit the min
recovery point because that isn't recorded. For me, the thing to do is
to make the min recovery point == end of WAL when state is
DB_IN_PRODUCTION. That way we don't need to do any new writes and we
don't need to risk people seeing inconsistent results if they do this.

But I think that still gives you a timeline problem when putting a
master back into a standby.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-02-13 21:02:12 Re: proposal or just idea for psql - show first N rows from relation backslash statement
Previous Message Stephen Frost 2013-02-13 20:51:00 Re: proposal or just idea for psql - show first N rows from relation backslash statement