Re: [RFC] Should we fix postmaster to avoid slow shutdown?

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date: 2016-11-22 06:56:29
Message-ID: 0A3221C70F24FB45833433255569204D1F65683A@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> The point I was trying to make is that I think the forced-removal behavior
> is not desirable, and therefore committing a patch that makes it be graven
> in stone is not desirable either.

I totally agree that we should pursue the direction for escaping from the complete loss of stats files. Personally, I would like to combine that with the idea of persistent performance diagnosis information for long-term analysis (IIRC, someone proposed it.) However, I don't think my patch will make everyone forget about the problem of stats file loss during recovery. The problem exists with or without my patch, and my patch doesn't have the power to delute the importance of the problem. If you are worried about memory, we can add an entry for the problem in TODO list that Bruce-san is maintaining.

Or, maybe we can just stop removing the stats files during recovery by keeping the files of previous generation and using it as the current one. I haven't seen how fresh the previous generation is (500ms ago?). A bit older might be better than nothing.

> The larger picture here is that Takayuki-san wants us to commit a patch
> based on a customer's objection to 9.2's behavior, without any real evidence
> that the 9.4 change isn't a sufficient solution. I've got absolutely zero
> sympathy for that "the stats collector might be stuck in an unkillable state"
> argument --- where's the evidence that the stats collector is any more prone
> to that than any other postmaster child?

9.4 change may be sufficient. But I don't think I can proudly explain the logic to a really severe customer. I can't answer the question "Why does PostgreSQL write files that will be deleted, even during 'immediate' shutdown? Why does PostgreSQL use 5 seconds for nothing?"

Other children do nothing and exit immediately. I believe they are behaving correctly.

> And for that matter, if we are stuck because of a nonresponding NFS server,
> how is a quicker postmaster exit going to help anything?
> You're not going to be able to start a new postmaster if the data directory
> is on a nonresponsive server.

NFS server can also be configured for HA, and the new postmaster can start as soon as the NFS server completes failover.

> I'd be willing to entertain a proposal to make the 5-second limit adjustable,
> but I don't think we need entirely new behavior here.

Then, I'm at a loss what to do for the 9.2 user.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Haribabu Kommi 2016-11-22 06:57:55 Re: New SQL counter statistics view (pg_stat_sql)
Previous Message Etsuro Fujita 2016-11-22 06:55:14 Re: Push down more full joins in postgres_fdw