Re: [RFC] Should we fix postmaster to avoid slow shutdown?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date: 2016-11-21 20:15:51
Message-ID: CA+TgmobkffkFeV5zQeQST=xpZpMVAYMfQkUnqg6PUMDMO6FLRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 20, 2016 at 10:20 PM, Tsunakawa, Takayuki
<tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> The reasons why I proposed this patch are:
>
> * It happened in a highly mission-critical production system of a customer who uses 9.2.
>
> * 9.4's solution is not perfect, because it wastes 5 seconds anyway, which is unexpected for users. The customer's requirement includes failover within 30 seconds, so 5 seconds can be seen as a risk.
> Plus, I'm worried about the possibility that the SIGKILLed process wouldn't disappear if it's writing to a network storage like NFS.
>
> * And first of all, the immediate shutdown should shut the server down immediately without doing anything heavy, as the name means.

So there are two questions here:

1. Should we try to avoid having the stats collector write a stats
file during an immediate shutdown? The file will be removed anyway
during crash recovery, so writing it is pointless. I think you are
right that 9.4's solution here is not perfect, because of the 5 second
delay, and also because if the stats collector is stuck inside the
kernel trying to write to the OS, it may be in a non-interruptible
wait state where even SIGKILL has no immediate effect. Anyway, it's
stupid even from a performance point of view to waste time writing a
file that we're just going to nuke.

2. Should we close listen sockets sooner during an immediate shutdown?
I agree with Tom and Peter that this isn't a good idea. People
expect the sockets not to go away until the end - e.g. they use
PQping() to test the server status, or they connect just to see what
error they get - and the fact that a client application could
hypothetically generate such a relentless stream of connection
attempts that the dead-end backends thereby created slow down shutdown
is not in my mind a sufficient reason to change the behavior.

So I think 001 should proceed and 002 should be rejected.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-11-21 20:16:15 Re: patch: function xmltable
Previous Message Tom Lane 2016-11-21 19:49:49 Re: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.