Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 20:53:35
Message-ID: AANLkTinrLrFfdcCi_0EPTjyXuZG-DPiK4z6D3sEfcvOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 24, 2010 at 9:58 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> Robert Haas wrote:
>>> Yeah, that seems very plausible, although exactly how to verify I don't know.
>
>> And here is confirmation from the Microsoft web site:
>
>>       In some instances, calling GetExitCode() against the failed process
>>       indicates the following exit code:
>>       128L ERROR_WAIT_NO_CHILDREN - There are no child processes to wait for.
>
> Given the existence of the deadman switch mechanism (which I hadn't
> remembered when this thread started), I'm coming around to the idea that
> we could just treat exit(128) as nonfatal on Windows.  If for some
> reason the child hadn't died instantly at startup, the deadman switch
> would distinguish that from the case described here.

So the options are:

(1) If running on Windows and the exit code is 128 and the deadman
switch is not engaged, don't crash-and-restart.
(2) If running on Windows, create a mutex in the parent process and
take it in the child; if the mutex has not been taken, don't
crash-and-restart.

There is some amount of user code (I'm not sure preceisely how much)
that runs after shared memory is mapped and before the deadman switch
is engaged. If we go with option #1, it would probably behoove us to
try to minimize the amount of such code (at least in HEAD). There is
probably not a great deal of danger that we could manage to scribble
on shared memory and then exit normally (rather than via signal),
never mind the need to exit with exactly 128. But "not a great deal"
is not the same as "none". If we go with option #2, the principal
danger seems to be that the code Magnus wrote will turn out to be less
robust than we might hope; for example, it might not work on all
versions of Windows, or be prone to some other installation-dependent
mischief.

Another question is how far either of these fixes could be
back-patched. I believe the dead-man switch only exists as far back
as 8.4, but the original commit message mentioned the possibility of
eventually back-patching it further:

Although this problem is of long standing, the lack of field complaints
seems to mean it's not critical enough to risk back-patching; at least
not till we get some more testing of this mechanism.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-08-24 21:11:38 Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Previous Message Magnus Hagander 2010-08-24 19:40:59 Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2010-08-24 20:53:45 Re: Backups from the standby (Incrementally Updated Backups), open item
Previous Message Josh Berkus 2010-08-24 20:44:16 Re: Backups from the standby (Incrementally Updated Backups), open item