Re: [RFC] Should we fix postmaster to avoid slow shutdown?

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date: 2016-11-21 03:20:41
Message-ID: 0A3221C70F24FB45833433255569204D1F653D1B@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Robert Haas [mailto:robertmhaas(at)gmail(dot)com]
> On Fri, Nov 18, 2016 at 4:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> >> Tom Lane wrote:
> >>> IMO it's not, and closer analysis says that this patch series is an
> >>> attempt to solve something we already fixed, better, in 9.4.
> >
> >> ... by the same patch submitter.
> >
> > [ confused ] The commit log credits 82233ce7e to MauMau and yourself.
>
> IIUC, MauMau = Tsunakawa Takayuki.

Yes, it's me. I'm pleased that you remember me!

First, I understand that zapping the stats file during recovery can be a problem. In fact, it's me who proposed adding a sentence in the manual that the stats file is reset after immediate shutdown. I think addressing this problem is another topic in a new thread.

The reasons why I proposed this patch are:

* It happened in a highly mission-critical production system of a customer who uses 9.2.

* 9.4's solution is not perfect, because it wastes 5 seconds anyway, which is unexpected for users. The customer's requirement includes failover within 30 seconds, so 5 seconds can be seen as a risk.
Plus, I'm worried about the possibility that the SIGKILLed process wouldn't disappear if it's writing to a network storage like NFS.

* And first of all, the immediate shutdown should shut the server down immediately without doing anything heavy, as the name means.

So, I think this patch should also be applied to later releases. The purpose of the patch in 9.4 was to avoid PostgreSQL's bug, where the ereport() in quickdie() gets stuck waiting for malloc()'s lock to be released.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-11-21 03:22:35 Re: WAL recycle retading based on active sync rep.
Previous Message Kyotaro HORIGUCHI 2016-11-21 03:12:07 Re: WAL recycle retading based on active sync rep.