Re: Attempt to stop dead instance can stop a random process?

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Attempt to stop dead instance can stop a random process?
Date: 2007-09-02 20:20:14
Message-ID: 46DAD4AE.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>> On Fri, Aug 31, 2007 at 3:10 PM, in message <1068(dot)1188591013(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Hmm. Do I correctly grasp the picture that you've got several Postgres
> installations on the machine and they're all booted by startup scripts?
>
> In this situation, it's actually not a bad idea to run each one under a
> separate userid. The problem is that in successive reboots, each
> postmaster will typically get almost but not exactly the same PID as
> last time, since the number of processes launched earlier in system
> startup is mostly but not completely deterministic. If you start all
> the postmasters together, as you probably do, then there will be
> occasions when one gets a PID that another one had in the previous boot
> cycle. That can lead to refusal to start up: if a postmaster sees a
> postmaster lock file in its data directory, containing a PID that
> belongs to another live process owned by the same userid, it has to
> assume that that's a conflicting postmaster and it must respect the lock
> file. You can prevent that problem if each postmaster (data directory)
> belongs to a different userid.

I was thinking of submitting a patch to add a recommendation to this effect
to section 16.1 ("The PostgreSQL User Account") in the documentation. Does
that seem appropriate to all? I'm not sure whether it would be worth
changing 16.2 ("Creating a Database Cluster") to say "while logged into the
PostgreSQL user account which you have chosen for the cluster".

> (Some people prefer to fix this by having a startup script that forcibly
> removes all the lockfiles before launching the postmasters. I think
> that's kinda risky, although if it's done in a separate script that
> you'd have no reason to run by hand, it's probably OK. Clueless folks
> put the action right in the postgresql start script, meaning that a
> thoughtless "service postgresql start" blows away the lock file...)

Would it be a good idea to mention pid file cleanup strategies in section
16.3 ("Starting the Database Server") where pid files are discussed, or
isn't that something we should get into in the docs?

Is there anywhere in the documentation to describe common causes and
solutions for messages such as these (from the log file)?:

[2007-09-02 11:47:14.697 CDT] 7910 FATAL: lock file "postmaster.pid" already exists
[2007-09-02 11:47:14.697 CDT] 7910 HINT: Is another postmaster (PID 7760) running in data directory "/var/pgsql/data/county/dunn/data"?
[2007-09-02 14:45:28.541 CDT] 21735 FATAL: lock file "/tmp/.s.PGSQL.5417.lock" already exists
[2007-09-02 14:45:28.541 CDT] 21735 HINT: Is another postmaster (PID 7760) using socket file "/tmp/.s.PGSQL.5417"?

-Kevin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-09-02 20:28:20 tsearch filenames unlikes special symbols and numbers
Previous Message Jeff Davis 2007-09-02 18:29:06 Re: Per-function search_path => per-function GUC settings