Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Chris Travers <chris(at)metatrontech(dot)com>, Cristian Bittel <cbittel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Date: 2010-08-24 12:57:34
Message-ID: 201008241257.o7OCvYt12456@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Robert Haas wrote:
> [moving to -hackers]
>
> On Thu, Aug 19, 2010 at 9:43 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > I suspect this is the same problem as bug #4897, and probably also the
> > same problem as this:
> > http://archives.postgresql.org/pgsql-bugs/2009-08/msg00114.php
> >
> > and maybe also this and this:
> > http://archives.postgresql.org/pgsql-bugs/2010-02/msg00179.php
> > http://archives.postgresql.org/pgsql-admin/2009-05/msg00105.php
> >
> > Unfortunately, it seems that no one has been able to get a stack trace yet.
>
> Bruce pointed out yet another report of this problem to me:
>
> http://archives.postgresql.org/pgsql-general/2010-08/msg00550.php
>
> After some discussion with Magnus, I think what is going on here is
> that the postmaster kicks off a new child process, which terminates
> before it actually starts running our code, either in OS-supplied code
> or some sort of "filter" like anti-spam or anti-virus software. It's
> presumably NOT dying in our code because - at least AFAICS - we don't
> exit(128) anywhere. One way we could possibly improve the situation
> is to not treat this as a child crash - that is, don't do a
> crash-and-restart cycle; just treat that backend as having done
> elog(FATAL). The trick is that you need a reliable way to distinguish
> between a regular child crash and an "early" child crash. Magnus
> suggested perhaps we could create a mutex that the child grabs before
> mapping shared memory; the postmaster could check whether the mutex
> had been taken. If so, we handle the crash normally; if not, we just
> chalk it up to experience and continue on.
>
> This isn't really a "fix" for the bug in the sense that the nicest
> thing of all would be to prevent the child from exiting abnormally in
> the first place. But it's far from clear that we can control that.

This URL has some interesting details on our problem:

http://stackoverflow.com/questions/139090/getexitcodeprocess-returns-128

Error code 128 is identified as:

error code 128 RROR_WAIT_NO_CHILDREN 128 0x80 There are no child
processes to wait for

and the suggested cause is:

Have a look at Desktop Heap memory.

Essentially the desktop heap issue comes down to exhausted resources (eg
starting too many processes). When your app runs out of these resources,
one of the symptoms is that you won't be able to start a new process,
and the call to CreateProcess will fail with code 128.

My guess is that at the time of CreateProcess(), there is enough desktop
heap memory, but at some later time, perhaps caused by a logout, there
isn't and the process never gets started.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2010-08-24 13:38:43 Re: [BUGS] BUG #5305: Postgres service stops when closing Windows session
Previous Message Magnus Hagander 2010-08-24 09:08:48 Re: BUG #5628: 9.0beta4 failed automatic crash recovery

Browse pgsql-hackers by date

  From Date Subject
Next Message McGehee, Robert 2010-08-24 13:25:30 Re: Unable to drop role
Previous Message Magnus Hagander 2010-08-24 12:11:06 Re: Fw: patch for pg_ctl.c to add windows service start-type