Re: Hot standby, recovery infra

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery infra
Date: 2009-02-04 17:03:49
Message-ID: 4989CA75.4030206@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> * I think we are now renaming the recovery.conf file too early. The
> comment says "We have already restored all the WAL segments we need from
> the archive, and we trust that they are not going to go away even if we
> crash." We have, but the files overwrite each other as they arrive, so
> if the last restartpoint is not in the last restored WAL file then it
> will only exist in the archive. The recovery.conf is the only place
> where we store the information on where the archive is and how to access
> it, so by renaming it out of the way we will be unable to crash recover
> until the first checkpoint is complete. So the way this was in the
> original patch is the correct way to go, AFAICS.

I can see what you mean now. In fact we're not safe even when the last
restartpoint is in the last restored WAL file, because we always restore
segments from the archive to a temporary filename.

Yes, renaming recovery.conf at the first checkpoint avoids that problem.
However, it means that we'll re-enter archive recovery if we crash
before that checkpoint is finished. Won't that cause havoc if more files
have appeared to the archive since the crash, and we restore those even
though we already started up from an earlier point? How do the timelines
work in that case?

We could avoid that by performing a good old startup checkpoint, but I
quite like the fast failover time we get without it.

> * my original intention was to deprecate log_restartpoints and would
> still like to do so. log_checkpoints does just as well for that. Even
> less code than before...

Ok, done.

> * comment on BgWriterShmemInit() refers to CHECKPOINT_IS_STARTUP, but
> the actual define is CHECKPOINT_STARTUP. Would prefer the "is" version
> because it matches the IS_SHUTDOWN naming.

Fixed.

> * In CreateCheckpoint() the if test on TruncateSubtrans() has been
> removed, but the comment has not been changed (to explain why).

Thanks, comment updated. (we now call CreateCheckPoint() only after
recovery is finished)

> We should continue to measure performance of recovery in the light of
> these changes. I still feel that fsyncing the control file on each
> XLogFileRead() will give a noticeable performance penalty, mostly
> because we know doing exactly the same thing in normal running caused a
> performance penalty. But that is easily changed and cannot be done with
> any certainty without wider feedback, so no reason to delay code commit.

I've changed the way minRecoveryPoint is updated now anyway, so it no
longer happens every XLogFileRead().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-02-04 17:11:38 Re: add_path optimization
Previous Message Teodor Sigaev 2009-02-04 16:56:22 Re: [PATCHES] GIN improvements