Re: [RFC] Should we fix postmaster to avoid slow shutdown?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date: 2016-11-26 15:43:49
Message-ID: CA+TgmoatGUFxcH6MduwtE=nyRScgNKbUFmPtRLh+55S2P3Pnjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 24, 2016 at 12:41 AM, Tsunakawa, Takayuki
<tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> From: pgsql-hackers-owner(at)postgresql(dot)org
>> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> > I agree. However, in many cases, the major cost of a fast shutdown is
>> > getting the dirty data already in the operating system buffers down to
>> > disk, not in writing out shared_buffers itself. The latter is
>> > probably a single-digit number of gigabytes, or maybe double-digit.
>> > The former might be a lot more, and the write of the pgstat file may
>> > back up behind it. I've seen cases where an 8kB buffered write from
>> > Postgres takes tens of seconds to complete because the OS buffer cache
>> > is already saturated with dirty data, and the stats files could easily
>> > be a lot more than that.
>>
>> I think this is mostly FUD, because we don't fsync the stats files. Maybe
>> we should, but we don't today. So even if we have managed to get the system
>> into a state where physical writes are heavily backlogged, that's not a
>> reason to assume that the stats collector will be unable to do its thing
>> promptly. All it has to do is push a relatively small amount of data into
>> kernel buffers.
>
> I'm sorry for my late reply, yesterday was a national holiday in Japan.
>
> It's not FUD. I understand you hit the slow stats file write problem during some regression test. You said it took 57 seconds to write the stats file during the postmaster shutdown. That caused pg_ctl stop to fail due to its 60 second timeout. Even the regression test environment suffered from the trouble.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-11-26 15:45:11 Re: Skipping PgStat_FunctionCallUsage for many expressions
Previous Message Robert Haas 2016-11-26 15:40:06 Re: Logical decoding on standby