Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)2ndQuadrant(dot)com>
Cc: Phil Sorber <phil(at)omniti(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)
Date: 2013-01-27 17:46:35
Message-ID: 10575.1359308795@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Craig Ringer <craig(at)2ndQuadrant(dot)com> writes:
> That's what it sounds like - confirming that PostgreSQL is really fully
> shut down.

> I'm not sure how you could do that over a protocol connection, myself.
> I'd just read the postmaster pid from the pidfile on disk and then `kill
> -0` it in a delay loop until the `kill` command returns failure. This
> could be a useful convenience utility but I'm not convinced it should be
> added to pg_isready because it requires local and possibly privileged
> execution, unlike pg_isready's network based operation. Privileges could
> be avoided by using an aliveness test other than `kill -0`, but you
> absolutely have to be local to verify that the postmaster has fully
> terminated - and it wouldn't make sense for a non-local process to care
> about this anyway.

This problem is actually quite a bit more difficult than it looks.
In particular, the mere fact that the postmaster process is gone does
not prove that the cluster is idle: it's possible that the postmaster
crashed leaving orphan backends behind, and the orphans are still busily
modifying on-disk state. A real postmaster knows how to check for that
(by looking at the nattch count of the shmem segment cited in the old
lockfile) but I can't see any shell script getting it right.

So ATM I wouldn't trust any method short of "try to start a new
postmaster and see if it works", which of course is not terribly helpful
if your objective is to get to a stopped state.

We could consider transposing the shmem logic into a new pg_ctl command.
It might be better though to have a new switch in the postgres
executable that just runs postmaster startup as far as detecting
lockfile conflicts, and reports what it found (without ever launching
any child processes that could confuse matters). Then "pg_ctl isdone"
could be a frontend for that, instead of duplicating logic.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-01-27 17:54:13 Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)
Previous Message Steve Singer 2013-01-27 17:28:21 Re: logical changeset generation v4