Re: Hot standby, recovery infra

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery infra
Date: 2009-02-04 18:25:31
Message-ID: 1233771931.4500.325.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Wed, 2009-02-04 at 19:03 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > * I think we are now renaming the recovery.conf file too early. The
> > comment says "We have already restored all the WAL segments we need from
> > the archive, and we trust that they are not going to go away even if we
> > crash." We have, but the files overwrite each other as they arrive, so
> > if the last restartpoint is not in the last restored WAL file then it
> > will only exist in the archive. The recovery.conf is the only place
> > where we store the information on where the archive is and how to access
> > it, so by renaming it out of the way we will be unable to crash recover
> > until the first checkpoint is complete. So the way this was in the
> > original patch is the correct way to go, AFAICS.
>
> I can see what you mean now. In fact we're not safe even when the last
> restartpoint is in the last restored WAL file, because we always restore
> segments from the archive to a temporary filename.
>
> Yes, renaming recovery.conf at the first checkpoint avoids that problem.
> However, it means that we'll re-enter archive recovery if we crash
> before that checkpoint is finished. Won't that cause havoc if more files
> have appeared to the archive since the crash, and we restore those even
> though we already started up from an earlier point? How do the timelines
> work in that case?

More archive files being added to archive is possible, but unlikely.
What *is* a definite problem is that if we ended recovery by manual
command (possible with HS patch) or recovery target then we would have
no record of which point it was that we finished at. It is then possible
that the recovery.conf has been re-edited, causing re-recovery to end at
a different place. That seems like a bad thing.

> We could avoid that by performing a good old startup checkpoint, but I
> quite like the fast failover time we get without it.

ISTM it's either slow failover or (fast failover, but restart archive
recovery if crashes).

I would suggest that at end of recovery we write the last LSN to the
control file, so if we crash recover then we will always end archive
recovery at the same place again should we re-enter it. So we would have
a recovery_target_lsn that overrides recovery_target_xid etc..

Given where we are, I would suggest we go for the slow failover option
in this release. Doing otherwise is a risky decision with little gain.
BGwriter-in-recovery is a good thing of itself and we have other code to
review yet with a higher importance level, and the rest of HS is code
I'm actually much happier with. I've spent weeks trying to get the
transition between recovery and normal running clean, but it seems like
time to say I didn't get it right *enough* to be absolutely certain.
(Just the fast failover part of patch!). Thanks for the review.

> > We should continue to measure performance of recovery in the light
> of
> > these changes. I still feel that fsyncing the control file on each
> > XLogFileRead() will give a noticeable performance penalty, mostly
> > because we know doing exactly the same thing in normal running
> caused a
> > performance penalty. But that is easily changed and cannot be done
> with
> > any certainty without wider feedback, so no reason to delay code
> commit.
>
> I've changed the way minRecoveryPoint is updated now anyway, so it no
> longer happens every XLogFileRead().

Care to elucidate?

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-02-04 18:32:47 Re: [HACKERS] BUG #4516: FOUND variable does not work after RETURN QUERY
Previous Message Bruce Momjian 2009-02-04 18:23:04 Re: Auto-updated fields