On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Robert Haas wrote:
>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>> And here is confirmation from the Microsoft web site:
>> In some instances, calling GetExitCode() against the failed process
>> indicates the following exit code:
>> 128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows. If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.
So the options are:
(1) If running on Windows and the exit code is 128 and the deadman
switch is not engaged, don't crash-and-restart.
(2) If running on Windows, create a mutex in the parent process and
take it in the child; if the mutex has not been taken, don't
There is some amount of user code (I'm not sure preceisely how much)
that runs after shared memory is mapped and before the deadman switch
is engaged. If we go with option #1, it would probably behoove us to
try to minimize the amount of such code (at least in HEAD). There is
probably not a great deal of danger that we could manage to scribble
on shared memory and then exit normally (rather than via signal),
never mind the need to exit with exactly 128. But "not a great deal"
is not the same as "none". If we go with option #2, the principal
danger seems to be that the code Magnus wrote will turn out to be less
robust than we might hope; for example, it might not work on all
versions of Windows, or be prone to some other installation-dependent
Another question is how far either of these fixes could be
back-patched. I believe the dead-man switch only exists as far back
as 8.4, but the original commit message mentioned the possibility of
eventually back-patching it further:
Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.
The Enterprise Postgres Company
In response to
pgsql-hackers by date
|Next:||From: Joshua D. Drake||Date: 2010-08-24 20:53:45|
|Subject: Re: Backups from the standby (Incrementally Updated
Backups), open item|
|Previous:||From: Josh Berkus||Date: 2010-08-24 20:44:16|
|Subject: Re: Backups from the standby (Incrementally Updated Backups),
pgsql-bugs by date
|Next:||From: Tom Lane||Date: 2010-08-24 21:11:38|
|Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session |
|Previous:||From: Magnus Hagander||Date: 2010-08-24 19:40:59|
|Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing