From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jan Wieck <JanWieck(at)Yahoo(dot)com> |
Cc: | Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: stats collector dies in current |
Date: | 2004-08-15 04:19:08 |
Message-ID: | 19363.1092543548@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Jan Wieck <JanWieck(at)Yahoo(dot)com> writes:
> In that context, is SIGTSTP similar to SIGSTOP in that it cannot be
> caught or ignored?
Possibly. I've reproduced the problem here on an RHL 8 system
(2.4.18 kernel) and I think it's a kernel bug. Points:
1. AFAICS, the only case where the stats buffer process will exit(1)
without logging a prior message is where it's gotten SIGCHLD. So,
hypothesis: it is the collector process (grandchild process) that
is dying.
2. Experiment one: try to strace the collector process to see what
it's doing. Result: failure goes away!!!
3. Experiment two: try to strace the buffer process. Result: indeed
it's getting SIGCHLD (in fact it seems to get it before SIGTSTP
arrives).
So at the very least we've got a Heisenbug, but my opinion is we are
seeing broken kernel behavior.
The only difference in signal handling that I can see from 7.4 is that
the collector process explicitly executes pqsignal calls to re-establish
all the signal handlers it should have inherited from its parent.
I suspect (but haven't tested) that removing that supposedly redundant
code would make the failure go away again.
The handler re-establishment was put in because it is needed for the
EXEC_BACKEND case, but possibly we could make it #ifndef EXEC_BACKEND
to work around this problem.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Gavin Sherry | 2004-08-15 05:02:28 | Re: 8.0 beta status |
Previous Message | Jan Wieck | 2004-08-15 03:54:49 | Re: stats collector dies in current |