Re: Hot standby, recovery infra

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, recovery infra
Date: 2009-02-04 11:35:19
Message-ID: 49897D77.503@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Fujii Masao wrote:
> On Fri, Jan 30, 2009 at 11:55 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> The startup process now catches SIGTERM, and calls proc_exit() at the next
>> WAL record. That's what will happen in a fast shutdown. Unexpected death of
>> the startup process is treated the same as a backend/auxiliary process
>> crash.
>
> If unexpected death of the startup process happens in automatic recovery
> after a crash, postmaster and bgwriter may get stuck. Because HandleChildCrash()
> can be called before FatalError flag is reset. When FatalError is false,
> HandleChildCrash() doesn't kill any auxiliary processes. So, bgwriter survives
> the crash and postmaster waits for the death of bgwriter forever with recovery
> status (which means that new connection cannot be started). Is this bug?

Yes, and in fact I ran into it myself yesterday while testing. It seems
that we should reset FatalError earlier, ie. when the recovery starts
and bgwriter is launched. I'm not sure why we in CVS HEAD we don't reset
FatalError until after the startup process is finished. Resetting it as
soon all the processes have been terminated and startup process is
launched again would seem like a more obvious place to do it. The only
difference that I can see is that if someone tries to connect while the
startup process is running, you now get a "the database system is in
recovery mode" message instead of "the database system is starting up"
if we're reinitializing after crash. We can keep that behavior, just
need to add another flag to mean "reinitializing after crash" that isn't
reset until the recovery is over.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Svenne Krap 2009-02-04 13:05:21 Re: LIMIT NULL
Previous Message Greg Stark 2009-02-04 10:55:55 Re: polyphase merge?