Re: "pg_ctl: the PID file ... is empty" at end of make check

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: "pg_ctl: the PID file ... is empty" at end of make check
Date: 2018-11-28 05:31:10
Message-ID: CAEepm=1dONOF+hBijV45dw3nsKe+OazmFHK3Lr44AbRsVPZTyA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 28, 2018 at 5:28 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
> > Today I saw a one-off case of $SUBJECT, on macOS. I can't reproduce
> > it, but I noticed exactly the same thing on longfin the other day:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-25%2005%3A39%3A04
>
> I trawled the buildfarm logs and discovered a second instance of exactly
> the same thing:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=longfin&dt=2018-11-19%2018%3A37%3A00
>
> There have not been any other occurrences in the past 3 months, which is
> as far back as I went. (lorikeet has half a dozen occurrences of "could
> not stop postmaster", which is what I was grepping for, but they all
> are associated with that machine's intermittent postmaster crashes.)
>
> So that lets out the flaky-hardware theory: that occurrence is before
> longfin's hardware transplant.
>
> Also, I don't think I believe the OS-bug idea either, given that you
> saw it on 10.14.0. longfin's been running 10.14.something since
> 2018-09-26, and has accumulated circa 200 runs since then just on HEAD,
> never mind the back branches. It'd be pretty unlikely to see it only
> in the past week, and only on HEAD, if it were an OS bug introduced two
> months ago.

Yeah, it'd be slightly easier to believe when High Sierra first came
out and every hfs+ volume was silently migrated to the brand new apfs.
But yeah, that idea seems like a long shot at this point.

> So my theory is we broke something in HEAD a couple weeks ago. But what?

Hmm. Not seeing it. I'm trying to do it again, with a make check loop.

> The fsync changes you made are suspiciously close to this issue (ie one
> could explain it as written data not getting out), and were committed in
> the right time frame, but that change didn't affect writes to
> postmaster.pid did it?

Commit 9ccdd7f6 doesn't affect writes to anything. It just changes
the elevel if certain fsync calls fail (and incidentally none near
this code, and in any case there was no failure).

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuro Yamada 2018-11-28 05:41:40 Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS
Previous Message Ideriha, Takeshi 2018-11-28 05:13:26 RE: Copy data to DSA area