Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)

From: Noah Misch <noah(at)leadboat(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Date: 2013-06-23 18:47:54
Message-ID: 20130623184754.GA826248@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jun 23, 2013 at 01:55:19PM +0900, MauMau wrote:
> From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
>> On Fri, Jun 21, 2013 at 10:02 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
>>> I'm comfortable with 5 seconds. We are talking about the interval
>>> between
>>> sending SIGQUIT to the children and then sending SIGKILL to them. In
>>> most
>>> situations, the backends should terminate immediately. However, as I
>>> said a
>>> few months ago, ereport() call in quickdie() can deadlock
>>> indefinitely. This
>>> is a PostgreSQL's bug.
>>
>> So let's fix that bug. Then we don't need this.

[quoted part filtered to undo caps lock]
> There are two ways to fix that bug. You are suggesting 1 of the
> following two methods, aren't you?

I suspect Robert meant to prevent the deadlock by limiting quickdie() to
calling only async-signal-safe functions. Implementing that looked grotty
when we last discussed it, though, and it would not help against a library or
trusted PL function suspending SIGQUIT.

> 1. (Robert-san's idea)
> Upon receipt of immediate shutdown request, postmaster sends SIGKILL to
> its children.
>
> 2. (Tom-san's idea)
> Upon receipt of immediate shutdown request, postmaster first sends
> SIGQUIT to its children, wait a while for them to terminate, then sends
> SIGKILL to them.
>
>
> At first I proposed 1. Then Tom san suggested 2 so that the server is as
> friendly to the clients as now by notifying them of the server shutdown.
> I was convinced by that idea and chose 2.
>
> Basically, I think both ideas are right. They can solve the original
> problem.

Agreed.

> However, re-considering the meaning of "immediate" shutdown, I feel 1 is
> a bit better, because trying to do something in quickdie() or somewhere
> does not match the idea of "immediate". We may not have to be friendly to

Perhaps so, but more important than the definition of the word "immediate" is
what an immediate shutdown has long meant in PostgreSQL. All this time it has
made some effort to send a message to the client. Note also that the patch
under discussion affects both immediate shutdowns and the automatic reset in
response to a backend crash. I think the latter is the more-important use
case, and the message is nice to have there.

> the clients at the immediate shutdown. The code gets much simpler. In
> addition, it may be better that we similarly send SIGKILL in backend
> crash (FatalError) case, eliminate the use of SIGQUIT and remove
> quickdie() and other SIGQUIT handlers.

My take is that the client message has some value, and we shouldn't just
discard it to simplify the code slightly. Finishing the shutdown quickly has
value, of course. The relative importance of those things should guide the
choice of a timeout under method #2, but I don't see a rigorous way to draw
the line. I feel five seconds is, if anything, too high.

What about using deadlock_timeout? It constitutes a rough barometer on the
system's tolerance for failure detection latency, and we already overload it
by having it guide log_lock_waits. The default of 1s makes sense to me for
this purpose, too. We can always add a separate immediate_shutdown_timeout if
there's demand, but I don't expect such demand. (If we did add such a GUC,
setting it to zero could be the way to request method 1. If some folks
strongly desire method 1, that's probably the way to offer it.)

Thanks,
nm

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-06-23 19:22:19 Re: changeset generation v5-01 - Patches & git tree
Previous Message Kevin Grittner 2013-06-23 17:32:05 Re: changeset generation v5-01 - Patches & git tree