Re: We should Axe /contrib/start-scripts

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Josh Berkus <josh(at)agliodbs(dot)com>, Chander Ganesan <chander(at)otg-nc(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: We should Axe /contrib/start-scripts
Date: 2009-08-19 21:22:23
Message-ID: 200908192122.n7JLMN726532@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Should we add a comment to the startup scripts linking this email?

http://archives.postgresql.org/message-id/28922.1250715832@sss.pgh.pa.us

---------------------------------------------------------------------------

Tom Lane wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> > Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> we do NOT use pg_ctl for [postmaster start], as it adds no value
> >>> and can cause the postmaster to misrecognize a stale lock file
>
> >> And? That statement was and remains perfectly correct.
>
> > Is this mentioned in the documentation somewhere that I've missed?
> > I'm curious what the issues are, and why we can solve it in a bash
> > script but not pg_ctl.
>
> It's been covered repeatedly in the archives, but I'm not sure if it's
> in the docs anywhere. The problem is that after a system crash and
> reboot, an old postmaster.pid file might be left behind. The postmaster
> can only safely remove this lock file if it is *certain* that it doesn't
> represent another live postmaster process. Otherwise it is honor-bound
> to commit hara-kiri instead of starting up. It can tell whether or not
> the PID in the file belongs to a live process and whether that process
> belongs to the postgres userid (by attempting kill(PID, 0) and seeing
> what it gets). If not, it can remove the file with a clear conscience.
> However, because of the way that Unix startup works, it is very likely
> that successive system boots will assign nearly (but not necessarily
> exactly) the same PID that the postmaster had on the previous cycle.
> So there's a high probability of a false positive from this test.
> If the PID matches our own exactly, we can discount it as a false
> positive. If it matches our parent's exactly, we can also discount it
> (knowing that a postmaster would never launch another postmaster
> directly, and being able to get the parent's PID via getppid()).
> But further up the chain, we're out of luck, because there is no
> "get grandparent pid" operation in Unix.
>
> What this all leads to is that it's safe to launch a postmaster from
> an init script via something like
> su - postgres sh -c "postmaster ..."
> The postmaster's parent process is a shell belonging to postgres,
> which it can discount via getppid(), and all further-up ancestors
> belong to root, so we can discount them via the kill test. So a
> false PID match cannot lead to failing to start. (You still have to
> be a bit careful about the form of the shell command, or there might
> be an intermediate postgres-owned shell process.)
>
> On the other hand, if you do
> su - postgres sh -c "pg_ctl ..."
> then the postmaster's parent process is pg_ctl, and its grandparent
> is a postgres-owned shell process, and it cannot tell that
> postgres-owned shell process apart from a genuine conflicting
> postmaster. So a chance match of the shell process's PID to what is in
> the leftover postmaster.pid file will force it to refuse to start.
> And that chance match is not a low probability --- in my experience
> it's one in ten or worse, in a reasonably stable system environment.
>
> You can imagine various workarounds involving having pg_ctl pass down
> its parent's PID, but you'll still get screwed if the initscript author
> is careless about how many levels of postgres-owned shell process there
> are. The long and the short of it is that it's best to not use pg_ctl.
> As mentioned, it doesn't buy much of anything for an initscript anyway.
>
> These considerations don't apply to ordinary hand launching of the
> postmaster, for the primary reason that the chance of a false PID match
> is several orders of magnitude smaller when you're talking about a
> manual restart --- the likely postmaster PID now ranges over the whole
> PID space instead of being within a few counts of the same thing. So we
> don't need to discourage people from using pg_ctl for ordinary restarts.
> The whole thing is really only a problem for initscript authors (who all
> know about it by now ;-))
>
> regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-08-19 21:29:34 Re: We should Axe /contrib/start-scripts
Previous Message David E. Wheeler 2009-08-19 21:14:28 Re: We should Axe /contrib/start-scripts