Skip site navigation (1) Skip section navigation (2)

Re: We should Axe /contrib/start-scripts

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Chander Ganesan <chander(at)otg-nc(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: We should Axe /contrib/start-scripts
Date: 2009-08-19 21:22:23
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
Should we add a comment to the startup scripts linking this email?


Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote: 
> >>> we do NOT use pg_ctl for [postmaster start], as it adds no value
> >>> and can cause the postmaster to misrecognize a stale lock file
> >> And?  That statement was and remains perfectly correct.
> > Is this mentioned in the documentation somewhere that I've missed?
> > I'm curious what the issues are, and why we can solve it in a bash
> > script but not pg_ctl.
> It's been covered repeatedly in the archives, but I'm not sure if it's
> in the docs anywhere.  The problem is that after a system crash and
> reboot, an old file might be left behind.  The postmaster
> can only safely remove this lock file if it is *certain* that it doesn't
> represent another live postmaster process.  Otherwise it is honor-bound
> to commit hara-kiri instead of starting up.  It can tell whether or not 
> the PID in the file belongs to a live process and whether that process
> belongs to the postgres userid (by attempting kill(PID, 0) and seeing
> what it gets).  If not, it can remove the file with a clear conscience.
> However, because of the way that Unix startup works, it is very likely
> that successive system boots will assign nearly (but not necessarily
> exactly) the same PID that the postmaster had on the previous cycle.
> So there's a high probability of a false positive from this test.
> If the PID matches our own exactly, we can discount it as a false
> positive.  If it matches our parent's exactly, we can also discount it
> (knowing that a postmaster would never launch another postmaster
> directly, and being able to get the parent's PID via getppid()).
> But further up the chain, we're out of luck, because there is no
> "get grandparent pid" operation in Unix.
> What this all leads to is that it's safe to launch a postmaster from
> an init script via something like
> 	su - postgres sh -c "postmaster ..."
> The postmaster's parent process is a shell belonging to postgres,
> which it can discount via getppid(), and all further-up ancestors
> belong to root, so we can discount them via the kill test.  So a
> false PID match cannot lead to failing to start.  (You still have to
> be a bit careful about the form of the shell command, or there might
> be an intermediate postgres-owned shell process.)
> On the other hand, if you do
> 	su - postgres sh -c "pg_ctl ..."
> then the postmaster's parent process is pg_ctl, and its grandparent
> is a postgres-owned shell process, and it cannot tell that
> postgres-owned shell process apart from a genuine conflicting
> postmaster.  So a chance match of the shell process's PID to what is in
> the leftover file will force it to refuse to start.
> And that chance match is not a low probability --- in my experience
> it's one in ten or worse, in a reasonably stable system environment.
> You can imagine various workarounds involving having pg_ctl pass down
> its parent's PID, but you'll still get screwed if the initscript author
> is careless about how many levels of postgres-owned shell process there
> are.  The long and the short of it is that it's best to not use pg_ctl.
> As mentioned, it doesn't buy much of anything for an initscript anyway.
> These considerations don't apply to ordinary hand launching of the
> postmaster, for the primary reason that the chance of a false PID match
> is several orders of magnitude smaller when you're talking about a
> manual restart --- the likely postmaster PID now ranges over the whole
> PID space instead of being within a few counts of the same thing.  So we
> don't need to discourage people from using pg_ctl for ordinary restarts.
> The whole thing is really only a problem for initscript authors (who all
> know about it by now ;-))
> 			regards, tom lane
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:

  Bruce Momjian  <bruce(at)momjian(dot)us>

  + If your life is a hard drive, Christ can be your backup. +

In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2009-08-19 21:29:34
Subject: Re: We should Axe /contrib/start-scripts
Previous:From: David E. WheelerDate: 2009-08-19 21:14:28
Subject: Re: We should Axe /contrib/start-scripts

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group