Re: Tracing down buildfarm "postmaster does not shut down" failures

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Noah Misch <noah(at)leadboat(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Tracing down buildfarm "postmaster does not shut down" failures
Date: 2016-02-10 03:02:17
Message-ID: 4948.1455073337@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> anyway, we got a failure pretty quickly:
> pg_ctl: server does not shut down at 2016-02-09 21:10:11.914 EST
> ...
> LOG: received fast shutdown request at 2016-02-09 21:09:11.824 EST
> ...
> LOG: checkpointer dead at 2016-02-09 21:09:14.683 EST
> LOG: all children dead at 2016-02-09 21:10:11.184 EST
> ...
> LOG: lock files all released at 2016-02-09 21:10:11.211 EST

Hmm. Apparently, pg_ctl gave up exactly one second too early.
The way the wait loop in pg_ctl is coded, it waits one second more
after the last get_pgpid() probe before complaining --- so the last
time it looked for the pidfile was approximately 21:10:10.914, just
300ms before the postmaster removed it. I wonder if that's entirely
coincidence.

Still, it seems clear that the bulk of the shutdown time is indeed the
stats collector taking its time about shutting down, which is doubly
weird because the ecpg tests shouldn't have created very many tables,
so why would there be a lot of data to write? Even granting that it's
not writing to ramdisk, 57 seconds to shut down seems pretty excessive.

I wonder if it's worth sticking some instrumentation into stats
collector shutdown?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-02-10 03:08:41 Re: Tracing down buildfarm "postmaster does not shut down" failures
Previous Message Jim Nasby 2016-02-10 02:41:27 Re: Tracing down buildfarm "postmaster does not shut down" failures