Re: FATAL: bogus data in lock file "postmaster.pid": ""

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Michael Beattie <mtbeedee(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FATAL: bogus data in lock file "postmaster.pid": ""
Date: 2012-08-27 22:16:53
Message-ID: 1346105371-sup-9216@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Excerpts from Robert Haas's message of lun ago 27 18:02:06 -0400 2012:
> On Mon, Aug 27, 2012 at 4:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Bruce Momjian <bruce(at)momjian(dot)us> writes:
> >> I have developed the attached patch to report a zero-length file, as you
> >> suggested.
> >
> > DIRECTORY_LOCK_FILE is entirely incorrect there.
> >
> > Taking a step back, I don't think this message is much better than the
> > existing behavior of reporting "bogus data". Either way, it's not
> > obvious to typical users what the problem is or what to do about it.
> > If we're going to emit a special message I think it should be more user
> > friendly than this.
> >
> > Perhaps something like:
> >
> > FATAL: lock file "foo" is empty
> > HINT: This may mean that another postmaster was starting at the
> > same time. If not, remove the lock file and try again.
>
> The problem with this is that it gives the customer only one remedy,
> which they will (if experience is any guide) try whether it is
> actually correct to do so or not.

How about having it sleep for a short while, then try again? I would
expect that it would cause the second postmaster to fail during the
second try, which is okay because the first one is then operational.
The problem, of course, is how long to sleep so that this doesn't fail
when load is high enough that the first postmaster still hasn't written
the file after the sleep.

Maybe

LOG: lock file "foo" is empty, sleeping to retry
-- sleep 100ms and recheck
LOG: lock file "foo" is empty, sleeping to retry
-- sleep, dunno, 1s, recheck
LOG: lock file "foo" is empty, sleeping to retry
-- sleep maybe 5s? recheck
FATAL: lock file "foo" is empty
HINT: Is another postmaster running on data directory "bar"?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2012-08-27 22:19:43 Re: MySQL search query is not executing in Postgres DB
Previous Message Robert Haas 2012-08-27 22:06:42 Re: temporal support patch