Re: Re: [SQL] PostgreSQL crashes on me :(

From: Ian Lance Taylor <ian(at)airs(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: mathijs(at)ilse(dot)nl, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [SQL] PostgreSQL crashes on me :(
Date: 2000-12-18 17:33:40
Message-ID: 20001218173340.29569.qmail@daffy.airs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

Date: Sun, 17 Dec 2000 22:47:55 -0500
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

BUT: I think there's a race
condition here, at least on systems where errno is not saved and
restored around a signal handler. Consider the following scenario:

Postmaster is waiting at the select() --- its normal state.

Postmaster receives a SIGCHLD signal due to backend exit, so
it goes off and does the reaper() thing. On return from
reaper() the system arranges to return EINTR error from
the select().

Before control can reach the "if (errno..." test, another
SIGCHLD comes in. reaper() is invoked again and does its
thing.

The normal exit condition from reaper() will be errno == ECHILD,
because that's what the waitpid() or wait3() call will return after
all children are dealt with. If the signal-handling mechanism allows
that to be returned to the mainline code, we have a failure.

Can any FreeBSD hackers comment on the plausibility of this theory?

I'm not a FreeBSD hacker, but I do know how the BSD kernel works
unless FreeBSD has changed things. The important facts are:

1) The kernel only delivers signals when a process moves from kernel
mode to user mode, after a system call or an interrupt (including a
timer interrupt).

2) The errno variable is set in user space after the process has
returned to user mode.

Therefore, the scenario you describe is possible, but only if there
happens to be both a timer interrupt and a SIGCHLD signal within a
couple of instructions after the select returns.

(I suppose that a page fault instead of a timer interrupt could have
the same effect as well, although a page fault here seems quite
unlikely unless the system is extremely overloaded.)

A quick-and-dirty workaround would be to save and restore errno in
reaper() and the other postmaster signal handlers. It might be
a better idea in the long run to avoid doing system calls in the
signal handlers --- but that would take a more substantial rewrite.

Ideally, signal handlers should not make system calls. However, if
this is impossible, then signal handlers must save and restore errno.

Ian

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2000-12-18 17:40:32 Re: Re: [SQL] PostgreSQL crashes on me :(
Previous Message Stephan Szabo 2000-12-18 17:19:45 Re: Ocasional problems !!!!

Browse pgsql-sql by date

  From Date Subject
Next Message Tom Lane 2000-12-18 17:40:32 Re: Re: [SQL] PostgreSQL crashes on me :(
Previous Message Reiner Dassing 2000-12-18 17:13:46 Re: Best database structure for timely ordered values