Re: Hot standby, recovery infra

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery infra
Date: 2009-01-29 07:34:48
Message-ID: 49815C18.5040609@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Thu, 2009-01-29 at 12:18 +0900, Fujii Masao wrote:
>> Though this is a matter of taste, I think that it's weird that bgwriter
>> runs with ThisTimeLineID = 0 during recovery. This is because
>> XLogCtl->ThisTimeLineID is set at the end of recovery. ISTM this will
>> be a cause of bug in the near future, though this is harmless currently.
>
> It doesn't. That's exactly what InitXLogAccess() was for.

It does *during recovery*, before InitXLogAccess is called. Yeah, it's
harmless currently. It would be pretty hard to keep it up-to-date in
bgwriter and other processes. I think it's better to keep it at 0, which
is clearly an invalid value, than try to keep it up-to-date and risk
using an old value. TimeLineID is not used in a lot of places,
currently. It might be a good idea to add some "Assert(TimeLineID != 0)"
to places where it used.

>>> + /*
>>> + * XXX: Should we try to perform restartpoints if we're not in consistent
>>> + * recovery? The bgwriter isn't doing it for us in that case.
>>> + */
>> I think yes. This is helpful for the system which it takes a long time to get
>> a base backup, ie. it also takes a long time to reach a consistent recovery
>> point.
>
> The original patch did this.

Yeah, I took it out. ISTM if you restore from a base backup, you'd want
to run it until consistent recovery anyway. We can put it back in if
people feel it's helpful. There's two ways to do it: we can fire up the
bgwriter before reaching consistent recovery point, or we can do the
restartpoints in startup process before that point. I'm inclined to fire
up the bgwriter earlier. That way, bgwriter remains responsible for all
checkpoints and restartpoints, which seems clearer. I can't see any
particular reason why bgwriter startup and reaching the consistent
recovery point need to be linked together.

>>> +CreateRestartPoint(int flags)
>> <snip>
>>> + * We rely on this lock to ensure that the startup process doesn't exit
>>> + * Recovery while we are half way through a restartpoint. XXX ?
>>> */
>>> + LWLockAcquire(CheckpointLock, LW_EXCLUSIVE);
>> Is this comment correct? CheckpointLock cannot prevent the startup process
>> from exiting recovery because the startup process doesn't acquire that lock.
>
> The original patch acquired CheckpointLock during exitRecovery to prove
> that a restartpoint was not in progress. It no longer does this, so not
> sure if Heikki has found another way and the comment is wrong, or that
> removing the lock request is a bug.

The comment is obsolete. There's no reason for startup process to wait
for a restartpoint to finish. Looking back at the archives and the
history of that, I got the impression that in a very early version of
the patch, startup process still created a shutdown checkpoint after
recovery. To do that, it had to interrupt an ongoing restartpoint. That
was deemed too dangerous/weird, so it was changed to wait for it to
finish instead. But since the startup process no longer creates a
shutdown checkpoint, the waiting became obsolete, right?

If there's a restartpoint in progress when we reach the end of recovery,
startup process will RequestCheckpoint as usual. That will cause
bgwriter to hurry the on-going restartpoint, skipping the usual delays,
and start the checkpoint as soon as the restartpoint is finished.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2009-01-29 08:03:34 Re: Commitfest infrastructure (was Re: 8.4 release =?iso-8859-1?q?=09planning?=)
Previous Message Simon Riggs 2009-01-29 07:20:58 Re: Hot standby, recovery infra