Re: segfault in pg 8.4, CurrentResourceOwner == NULL while processing SIGTERM

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dennis Koegel <dk(at)openit(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: segfault in pg 8.4, CurrentResourceOwner == NULL while processing SIGTERM
Date: 2010-03-19 19:38:04
Message-ID: 4948.1269027484@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dennis Koegel <dk(at)openit(dot)de> writes:
> [ crash when a SQL function is killed by SIGTERM ]

Frankly, I would expect this to fail with even higher probability in
8.1. We didn't even pretend that SIGTERM'ing individual backends was
a safe thing to do in 8.1. You ought to rethink your connection
management methodology.

Now having said that, there are a couple of bad things happening in
this stack trace:

1. sql_exec_error_callback() is supposing that it's safe for it to do
a SearchSysCache lookup. On reflection this is a fundamentally bad
idea, because an error context callback is typically going to get
invoked in already-aborted transactions, and it's never safe to initiate
new catalog accesses in such a situation. The error is probably masked
in most cases by the desired pg_proc row being already in cache, but
if it had gotten flushed from cache for some reason then bad things
would happen. And then of course we have this case, where we're outside
a transaction entirely by the time control arrives at log_disconnections.

2. log_disconnections() is blithely doing an ordinary ereport(), which
will result in attempting to annotate the disconnection log message
with whatever context data might be supplied by active callback stack
entries. I cannot imagine any situation where callback stack entries
would actually be relevant to the disconnection log message, and as we
see here it's a context where the callback functions are very possibly
going to misbehave. What seems like a good idea is to flush the error
callback stack to empty, either in log_disconnections itself or maybe
better at the beginning of proc_exit().

I think both of these things need to be changed. Comments?

BTW, a quick look through the sources says that
sql_function_parse_error_callback is also making the unsafe assumption
that it can do SearchSysCache, but I don't see any other error callback
that is doing anything unsafe. Some other callbacks with similar needs,
such as sql_inline_error_callback, expect the callback setup code to
have fetched the tuple for them, and that seems like what we want for
SQL functions too.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-03-19 20:13:40 Re: BUG #5384: pg_dump hard-codes use of /tmp
Previous Message Jon Nelson 2010-03-19 16:56:49 BUG #5384: pg_dump hard-codes use of /tmp