Re: Improve shutdown during online backup, take 4

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "Magnus Hagander" <magnus(at)hagander(dot)net>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Improve shutdown during online backup, take 4
Date: 2008-04-24 14:25:25
Message-ID: 28186.1209047125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> writes:
> Tom Lane wrote:
>> Lastly, the changes to pmdie's SIGINT handling seem quite bogus.
>> Don't you need to transition into WAIT_BACKUP rather than WAIT_BACKENDS
>> state in that case too? Shouldn't you do CancelBackup *before*
>> PostmasterStateMachine? The thing screams of race conditions.

> I suspect there must be a misunderstanding.
> You cannot really mean that the postmaster should enter WAIT_BACKUP
> state on a fast shutdown request.

Why not? It'll fall out of the state again immediately in
PostmasterStateMachine, no, if we do a CancelBackup here? I don't think
these two paths of control should be any more different than really
necessary.

> Sure, CancelBackup could be called earlier. It doesn't do much more than
> rename a file.
> My reason for calling it late was that backup mode should really only be
> cancelled if we manage to shutdown cleanly, and this is not clear until
> the last child is gone.

Well, if there were anything conditional about calling it, then maybe
that argument would hold some water, but the way you've got it here it
*will* get called anyway, just after the PostmasterStateMachine call
... except in the corner case where there were no child processes left
and so PostmasterStateMachine manages to decide it's ready to exit().
So far from implementing the logic you suggest, this arrangement never
gets it right, and has a race condition case in which it gets it exactly
backward.

The other reason for the remark about race conditions is that the
PostmasterStateMachine call should absolutely be the last thing that
pmdie() does --- putting anything after it is wrong, especially things
that might alter the PM state, as indeed CancelBackup could. The reason
for having that in the signal handler is to cover the possibility that
no such call will occur immediately when we return to the wait loop.
In general all of the condition handlers in postmaster.c should be of
the form "respond to the immediate condition, and then let
PostmasterStateMachine decide if there's anything else to do".

If you actually want the behavior you propose, then the only correct way
to implement it is to embed it into the state machine logic, ie, do the
CancelBackup inside PostmasterStateMachine in some state transition
taken after the last child is gone.

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2008-04-24 14:28:25 Re: Proposed patch - psql wraps at window width
Previous Message Magnus Hagander 2008-04-24 14:24:06 Re: Improve shutdown during online backup, take 4