Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>, "Noah Misch" <noah(at)leadboat(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andres Freund" <andres(at)2ndquadrant(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgreSQL(dot)org>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Subject: Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date: 2013-06-25 12:57:52
Message-ID: B745B64BB2E54268BC2EAA7F23154376@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Alvaro Herrera" <alvherre(at)2ndquadrant(dot)com>
> Yeah, I see that --- after removing that early exit, there are unwanted
> messages. And in fact there are some signals sent that weren't
> previously sent. Clearly we need something here: if we're in immediate
> shutdown handler, don't signal anyone (because they have already been
> signalled) and don't log any more messages; but the cleaning up of
> postmaster's process list must still be carried out.
>
> Would you please add that on top of the attached cleaned up version of
> your patch?

Thanks. I'll do that tomorrow at the earliest.

> Noah Misch escribió:
>> On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote:
>
>> > the clients at the immediate shutdown. The code gets much simpler. In
>> > addition, it may be better that we similarly send SIGKILL in backend
>> > crash (FatalError) case, eliminate the use of SIGQUIT and remove
>> > quickdie() and other SIGQUIT handlers.
>>
>> My take is that the client message has some value, and we shouldn't just
>> discard it to simplify the code slightly. Finishing the shutdown quickly
>> has
>> value, of course. The relative importance of those things should guide
>> the
>> choice of a timeout under method #2, but I don't see a rigorous way to
>> draw
>> the line. I feel five seconds is, if anything, too high.
>
> There's obviously a lot of disagreement on 5 seconds being too high or
> too low. We have just followed SysV init's precedent of waiting 5
> seconds by default between sending signals TERM and QUIT during a
> shutdown. I will note that during a normal shutdown, services are
> entitled to do much more work than just signal all their children to
> exit immediately; and yet I don't find much evidence that this period is
> inordinately short. I don't feel strongly that it couldn't be shorter,
> though.
>
>> What about using deadlock_timeout? It constitutes a rough barometer on
>> the
>> system's tolerance for failure detection latency, and we already overload
>> it
>> by having it guide log_lock_waits. The default of 1s makes sense to me
>> for
>> this purpose, too. We can always add a separate
>> immediate_shutdown_timeout if
>> there's demand, but I don't expect such demand. (If we did add such a
>> GUC,
>> setting it to zero could be the way to request method 1. If some folks
>> strongly desire method 1, that's probably the way to offer it.)
>
> I dunno. Having this be configurable seems overkill to me. But perhaps
> that's the way to satisfy most people: some people could set it very
> high so that they could have postmaster wait longer if they believe
> their server is going to be overloaded; people wishing immediate SIGKILL
> could get that too, as you describe.
>
> I think this should be a separate patch, however.

I think so, too. We can add a parameter later if we find it highly
necessary after some experience in the field.

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-06-25 13:01:52 Re: Possible bug in CASE evaluation
Previous Message Yuri Levinsky 2013-06-25 12:48:19 Hash partitioning.