Re: [SQL] PostgreSQL crashes on me :(

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mathijs Brands <mathijs(at)ilse(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [SQL] PostgreSQL crashes on me :(
Date: 2000-12-18 03:47:55
Message-ID: 21339.977111275@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Mathijs Brands <mathijs(at)ilse(dot)nl> writes:
> We recently installed a small server for an external party to develop
> websites on. This machine, a K6-233 with 256 MB, is running FreeBSD 3.3
> and PostgreSQL 7.0.2 (maybe I'll upgrade to 7.0.3 tonight). The database
> it's running is about 2 MB in size and gets to process an estimated 10000
> to 25000 queries per day. Nothing special, I'd say.

> However, pgsql keeps crashing. It can take days, but pgsql will crash.
> It spits out the following error:

> ServerLoop: select failed: No child processes

Hm. It seems fairly unlikely that select() would return error ECHILD,
which is what this message *appears* to imply. The code is

if (select(nSockets, &rmask, &wmask, (fd_set *) NULL,
(struct timeval *) NULL) < 0)
{
if (errno == EINTR)
continue;
fprintf(stderr, "%s: ServerLoop: select failed: %s\n",
progname, strerror(errno));
return STATUS_ERROR;
}

which seems pretty straightforward. BUT: I think there's a race
condition here, at least on systems where errno is not saved and
restored around a signal handler. Consider the following scenario:

Postmaster is waiting at the select() --- its normal state.

Postmaster receives a SIGCHLD signal due to backend exit, so
it goes off and does the reaper() thing. On return from
reaper() the system arranges to return EINTR error from
the select().

Before control can reach the "if (errno..." test, another
SIGCHLD comes in. reaper() is invoked again and does its
thing.

The normal exit condition from reaper() will be errno == ECHILD,
because that's what the waitpid() or wait3() call will return after
all children are dealt with. If the signal-handling mechanism allows
that to be returned to the mainline code, we have a failure.

Can any FreeBSD hackers comment on the plausibility of this theory?

A quick-and-dirty workaround would be to save and restore errno in
reaper() and the other postmaster signal handlers. It might be
a better idea in the long run to avoid doing system calls in the
signal handlers --- but that would take a more substantial rewrite.

I seem to recall previous pghackers discussions in which
saving/restoring errno looked like a good idea. Not sure why
it hasn't been done already.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2000-12-18 04:08:01 Re: [HACKERS] 7.1 features list
Previous Message Robert B. Easter 2000-12-18 02:46:38 Re: Tuple data

Browse pgsql-sql by date

  From Date Subject
Next Message Nathan Myers 2000-12-18 07:01:41 Re: Re: [SQL] PostgreSQL crashes on me :(
Previous Message Mathijs Brands 2000-12-18 00:08:55 Re: Don't understand creation statement's answer