Re: FATAL: bogus data in lock file "postmaster.pid": ""

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-28 02:17:43
Message-ID: 20120828021743.GC6786@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 09:59:10PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Mon, Aug 27, 2012 at 07:39:35PM -0400, Tom Lane wrote:
> >> I could get behind that, but I don't think the delay should be more than
> >> 100ms or so.
>
> > I took Alvaro's approach of a sleep. The file test was already in a
> > loop that went 100 times. Basically, if the lock file exists, this
> > postmaster isn't going to succeed, so I figured there is no reason to
> > rush in the testing. I gave it 5 tries with one second between
> > attempts. Either the file is being populated, or it is stale and empty.
>
> How did "100ms" translate to 5 seconds?

That was the "no need to rush, let's just be sure of what we report".

> > I checked pg_ctl and that has a default wait of 60 second, so 5 seconds
> > to exit out of the postmaster should be fine.
>
> pg_ctl is not the only consideration here. In particular, there are a
> lot of initscripts out there (all of Red Hat's, for instance) that don't
> use pg_ctl and expect the postmaster to come up (or not) in a couple of
> seconds.
>
> I don't see a need for more than about one retry with 100ms delay.
> There is no evidence that the case we're worried about has ever occurred
> in the real world anyway, so slowing down error failures to make really
> really really sure there's not a competing postmaster doesn't seem like
> a good tradeoff.
>
> I'm not terribly impressed with that errhint, either.

I am concerned at 100ms that we can't be sure if it is still being
created, and if we can't be sure, I am not sure there is much point in
trying to clarify the odd error message we omit.

FYI, here is what the code does now with a zero-length pid file, with my
patch:

$ postmaster
[ wait 5 seconds ]
FATAL: lock file "postmaster.pid" is empty
HINT: Empty lock file probably left from operating system crash during
database startup; file deletion suggested.
$ pg_ctl start
pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid"
$ pg_ctl -w start
pg_ctl: invalid data in PID file "/u/pgsql/data/postmaster.pid"

Seems pg_ctl would also need some cleanup if we change the error
message and/or timing.

I am thinking we should just change the error message in the postmaster
and pg_ctl to say the file is empty, and call it done (no hint message).
If we do want a hint, say that either the file is stale from a crash or
another postmaster is starting up, and let the user diagnose it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-08-28 02:20:29 Re: Incorrect behaviour when using a GiST index on points
Previous Message Dickson S. Guedes 2012-08-28 02:01:19 Re: CREATE SCHEMA IF NOT EXISTS