Re: Debugging buildfarm pg_upgrade check failures

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Debugging buildfarm pg_upgrade check failures
Date: 2015-07-25 20:55:32
Message-ID: 55B3F7C4.8000205@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 07/25/2015 10:59 AM, Tom Lane wrote:
> Now that we've restored proper logging of "make check", I looked into
> today's failure report from axolotl:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=axolotl&dt=2015-07-24%2020%3A29%3A18
>
> What evidently happened there is that "pg_ctl start" gave up waiting for
> the postmaster to start too soon. The postmaster log appears to contain
>
> LOG: database system was shut down at 2015-07-24 16:45:40 EDT
> FATAL: the database system is starting up
> LOG: MultiXact member wraparound protections are now enabled
> LOG: database system is ready to accept connections
>
> which indicates that it did successfully come up, but not till after one
> "PQping" probe from pg_ctl, which was rejected with "still starting up".
> Meanwhile we've got this log output from pg_ctl:
>
> waiting for server to start........ stopped waiting
> pg_ctl: could not start server
> Examine the log output.
>
> Counting the dots indicates that pg_ctl gave up after precisely 5 seconds.
> Now, looking at the logic in pg_ctl's test_postmaster_connection(), the
> only explanation that seems to fit the observed output is that the stat()
> on the postmaster pidfile (at line 650 in HEAD) failed. It's not clear
> why though, since the postmaster was clearly still alive at this point,
> and we must have been able to read the pidfile earlier to construct a
> connection string, else there would have been no PQping attempt.
>
> Maybe the stat failed for some unexpected resource-exhaustion kind of
> reason?
>
> It seems plausible to me that we should change pg_ctl to only consider
> stat() failure to be a reason to give up waiting if errno is ENOENT,
> not anything else. At a minimum, I'd like to modify it to print the
> errno if it's anything else, so that we can confirm or deny this theory
> next time we see this buildfarm failure.
>
> Comments?
>
>

Certainly let's look at the errno.

cheers

andrdew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-07-25 21:09:31 Re: multivariate statistics / patch v7
Previous Message Andrew Dunstan 2015-07-25 20:38:54 Re: pg_dump -Fd and compression level