Re: pg_terminate_backend() issues

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_terminate_backend() issues
Date: 2008-04-16 04:54:31
Message-ID: 22867.1208321671@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I am starting to think we need to analyze the source code rather than
> testing, because of what we are testing for.

I was thinking about that a bit more, in connection with trying to
verbalize why I don't like your patch ;-)

The fundamental assumption here is that we think that proc_exit() from
the main loop of PostgresMain() should be safe, because that's exercised
anytime a client quits or dies unexpectedly, and so it should be a
well-tested path. What is lacking such test coverage is the many code
paths that start proc_exit() from any random CHECK_FOR_INTERRUPTS()
point in the backend. Some of those points might be (in fact are known
to be, as of today) holding locks or otherwise in a state that prevents
a clean proc_exit().

Your patch proposes to replace that code path by one that fakes a
QueryCancel error and then does proc_exit() once control reaches the
outer level. While that certainly *should* work (ignoring the question
of whether control gets to the outer level in any reasonable amount of
time), it's a bit of a stretch to claim that it's a well-tested case.
At the moment it would only arise if a QueryCancel interrupt arrived at
approximately the same time that the client died or sent a disconnect
message, which is surely far less probable than client death by itself.
So I don't buy the argument that this is a better-tested path, and
I definitely don't want to introduce new complexity in order to make it
happen like that.

Now, as to whether a SIGTERM-style exit is safe. With the exception
of the PG_CATCH issues that we already know about, any other case seems
like a clear coding error: you'd have to be doing something that you
know could throw an error, without having made provision to clean up
whatever unclean state you are in. It'd be nice to think that we
haven't been that silly anywhere. Experience however teaches that such
an assumption is hubris.

I think that the way forward is to fix the PG_CATCH issues (which I will
try to get done later this week) and then get someone to mount a serious
effort to break it, as outlined in previous discussions:
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00174.php

I'm willing to enable a SIGTERM-based pg_terminate_backend for 8.4
if there is some reasonable amount of testing done during this
development cycle to try to expose any problems. If no one is willing
to do any such testing, then as far as I'm concerned there is no market
for the feature and we should drop it from TODO.

regards, tom lane

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Magnus Hagander 2008-04-16 07:40:13 Re: pg_terminate_backend() issues
Previous Message Bruce Momjian 2008-04-16 04:09:43 Re: pg_terminate_backend() issues

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2008-04-16 05:34:26 Re: WIP: Pg_upgrade - page layout converter (PLC) hook
Previous Message ITAGAKI Takahiro 2008-04-16 04:22:13 Re: Sorting writes during checkpoint