Re: [HACKERS] postmaster disappears

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: t-ishii(at)sra(dot)co(dot)jp, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] postmaster disappears
Date: 1999-09-22 04:49:09
Message-ID: 199909220449.NAA26668@srapc451.sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> Not sure. reaper() may be called while reaper() is executing if a new
>> SIGCHLD is raised. How do you handle this case?
>
>No, because the signal is disabled when the trap is taken, and then not
>re-enabled until reaper() does pqsignal() just before exiting. We don't

You are correct. I had wrong impression about signal handling.

>>> Moreover, you're not actually checking what the select() did unless
>>> you do it that way.
>
>> Sorry, I don't understand this. Can you explain, please?
>
>If you don't have the signal routine save/restore errno, then (when this
>problem occurs) you are not seeing the errno returned by the select(),
>but one left over from reaper()'s activity. If the select() failed, you
>won't know it.

Oh, I see your point.

>>> Curious that this sort of problem is not seen more often --- I wonder
>>> if most Unixes arrange to save/restore errno around a signal handler
>>> for you?
>
>> Maybe because the situation I have pointed out is relatively rare.
>
>Well, the window for trouble is awfully tiny in this particular code of
>ours, but it might be larger in other programs.

Though it seems rare, we certainly have had this kind of reports from
users for a while. Since disappearing postmaster is a really bad
thing, I love to see solutions for this.

>Yet I don't think I've
>ever heard a programming recommendation to save/restore errno in signal
>handlers...

Agreed. I don't like this way.

I asked a Unix guru, and got a suggestion that we do not need to call
wait() (and CleanupProc()) inside the signal handler. Instead we could
have a null signal hander (it just calls pqsignal()) for SIGCHLD. If
select() returns EINTR then we just call wait() and
CleanupProc(). Moreover this would eliminate sigprocmask() or
sigblock() calls currently done to avoid race conditions before going
into the critical region. Of course we have to call wait() and
CleanupProc() before select() to make sure that we have no waiting
children.

Another way would be blocking SIGCHILD before calling select(). In
this case appropriate time out setting for select() is necessary,
though.
--
Tatsuo Ishii

Browse pgsql-hackers by date

  From Date Subject
Next Message The Hermit Hacker 1999-09-22 05:46:46 Re: [HACKERS] Re: [GENERAL] Update of bitmask type
Previous Message Bruce Momjian 1999-09-22 02:17:29 Re: [HACKERS] Early evaluation of constant expresions (with PATCH)