Re: Revised patch for fixing archiver shutdown behavior

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Revised patch for fixing archiver shutdown behavior
Date: 2008-01-10 01:16:36
Message-ID: 22574.1199927796@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

I wrote:
> One point needing discussion is that the postmaster is currently
> coded not to send SIGUSR1 to the archiver if a fast-mode shutdown
> is under way. I duplicated that in the added SIGUSR1 signal here,
> but I wonder whether it is sane or not. Comments?

After chewing on that for awhile, I decided it was bogus. If we are
going to have a policy that the archiver gets a chance to archive
everything, that shouldn't depend on fast vs. smart shutdown; those
alternatives determine whether we kick clients out ungracefully,
not whether we take extra risks with committed data.

I think we should allow the archiver to finish out its tasks fully
in all non-crash cases except one: if we got SIGTERM from init.
In that case there's a very great risk of being SIGKILL'd before
we can finish archiving. The postmaster cannot easily tell whether
its SIGTERM came from init or not, but we can drive this off the
archiver itself getting SIGTERM'd. I propose that if the archiver
receives SIGTERM, it should cease to issue any new archive commands,
but just wait till it sees the postmaster exit. (It can't exit
right away, since there's a race condition: the postmaster might
not have been SIGTERM'd yet, and might therefore spawn a new
archiver, which would have no idea it's unsafe to do anything more.)

There's an obvious failure mode in that, which is that a randomly
issued SIGTERM to the archiver would shut down archiving indefinitely.
We can guard against that with a timeout: the archiver should exit
a minute or two after being SIGTERM'd, even if the postmaster is still
there. That should certainly be enough delay to avoid the race
condition, and if in fact everything is still hunky-dory the
postmaster will immediately spawn a new archiver.

Hence, attached revised patch ...

regards, tom lane

Attachment Content-Type Size
archiver-shutdown-2.patch application/octet-stream 11.2 KB

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2008-01-10 01:36:52 Re: BUG #3860: xpath crashes backend when is querying xmlagg result
Previous Message Alvaro Herrera 2008-01-10 00:41:14 Re: BUG #3860: xpath crashes backend when is querying xmlagg result