Re: random failing builds on spoonbill - backends not exiting...

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: random failing builds on spoonbill - backends not exiting...
Date: 2012-06-24 16:14:08
Message-ID: 10830.1340554448@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> On 06/22/2012 11:47 PM, Tom Lane wrote:
>> Could you gdb each of these processes and get a stack trace?

[ unsurprising stack traces ]

OK, so they're waiting exactly where they should be.

So what we know is that the shutdown failure is caused by the child
processes having all blockable signals blocked. What we don't know
is how they could have gotten that way. I do not see any way that
the code in Postgres would ever block all signals, which makes this
look a lot like a BSD libc bug. But that doesn't help much.

The only way I can think of to narrow it down is to run the postmaster
under "strace -f" or local equivalent, and look for sigprocmask calls
that set all the bits. That could be pretty tedious though, if the
occurrence of the bug is as low-probability as it seems to be.

Given your earlier comment about a new threading library, I wonder
whether this has something to do with confusion between process-wide
and per-thread signal masks ...

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-06-24 16:22:30 pgsql: Move WAL continuation record information to WAL page header.
Previous Message Tom Lane 2012-06-24 15:19:02 Re: pg_dump and dependencies and --section ... it's a mess