Re: [RFC] Should we fix postmaster to avoid slow shutdown?

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Date: 2016-11-24 05:41:08
Message-ID: 0A3221C70F24FB45833433255569204D1F65A8D6@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > I agree. However, in many cases, the major cost of a fast shutdown is
> > getting the dirty data already in the operating system buffers down to
> > disk, not in writing out shared_buffers itself. The latter is
> > probably a single-digit number of gigabytes, or maybe double-digit.
> > The former might be a lot more, and the write of the pgstat file may
> > back up behind it. I've seen cases where an 8kB buffered write from
> > Postgres takes tens of seconds to complete because the OS buffer cache
> > is already saturated with dirty data, and the stats files could easily
> > be a lot more than that.
>
> I think this is mostly FUD, because we don't fsync the stats files. Maybe
> we should, but we don't today. So even if we have managed to get the system
> into a state where physical writes are heavily backlogged, that's not a
> reason to assume that the stats collector will be unable to do its thing
> promptly. All it has to do is push a relatively small amount of data into
> kernel buffers.

I'm sorry for my late reply, yesterday was a national holiday in Japan.

It's not FUD. I understand you hit the slow stats file write problem during some regression test. You said it took 57 seconds to write the stats file during the postmaster shutdown. That caused pg_ctl stop to fail due to its 60 second timeout. Even the regression test environment suffered from the trouble.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tsunakawa, Takayuki 2016-11-24 05:57:16 Re: [RFC] Should we fix postmaster to avoid slow shutdown?
Previous Message Ashutosh Bapat 2016-11-24 05:39:22 Re: Declarative partitioning - another take