Re: Tracing down buildfarm "postmaster does not shut down" failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures
Date: 2016-02-10 03:27:07
Message-ID: 5713.1455074827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote:
>> I wonder if it's worth sticking some instrumentation into stats
>> collector shutdown?

> I wouldn't be surprised if the collector got backlogged during the main phase
> of testing and took awhile to chew through its message queue before even
> starting the write of the final stats.

But why would the ecpg tests show such an effect when the main regression
tests don't? AFAIK the ecpg tests don't exactly stress the server ---
note the trivial amount of data written by the shutdown checkpoint,
for instance.

The other weird thing is that it's only sometimes slow. If you look at
the last buildfarm result from axolotl, for instance, the tail end of
the ecpg log is

LOG: ShutdownSUBTRANS() complete at 2016-02-09 16:31:14.784 EST
LOG: database system is shut down at 2016-02-09 16:31:14.784 EST
LOG: lock files all released at 2016-02-09 16:31:14.817 EST

so we only spent ~50ms on stats write that time.

The idea I was toying with is that previous filesystem activity (making
the temp install, the server's never-fsync'd writes, etc) has built up a
bunch of dirty kernel buffers, and at some point the kernel goes nuts
writing all that data. So the issues we're seeing would come and go
depending on the timing of that I/O spike. I'm not sure how to prove
such a theory from here.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-02-10 03:40:51 Re: Support for N synchronous standby servers - take 2
Previous Message Kyotaro HORIGUCHI 2016-02-10 03:20:48 Re: Support for N synchronous standby servers - take 2