Re: Hot Standy introduced problem with query cancel behavior

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Joachim Wieland <joe(at)mcknight(dot)de>, Kris Jurka <books(at)ejurka(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Hot Standy introduced problem with query cancel behavior
Date: 2009-12-31 00:09:57
Message-ID: 603c8f070912301609g37be0361k35d09aefd6be2ced@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 30, 2009 at 6:43 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> ----- Ursprüngliche Mitteilung -----
>> On Wed, Dec 30, 2009 at 2:06 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On Wednesday 30 December 2009 01:13:01 Simon Riggs wrote:
>> > > On Tue, 2009-12-29 at 11:13 -0500, Tom Lane wrote:
>> > > > Andres Freund <andres(at)anarazel(dot)de> writes:
>> > > > > On Tuesday 29 December 2009 16:22:54 Tom Lane wrote:
>> > > > > > This seems like a fairly bad idea.  One of the intended use-cases is
>> > > > > > to be able to manually "kill -INT" a misbehaving backend.  Assuming
>> > > > > > that there will be valid info about the signal in shared memory will
>> > > > > > break that.
>> > > > >
>> > > > > Well. That already is the case now. MyProc->recoveryConflictMode is
>> > > > > checked to recognize what kind of conflict is being resolved...
>> > > >
>> > > > In that case, HS has already broken it, and we need to fix it not make
>> > > > it worse.
>> > > >
>> > > > My humble opinion is that SIGINT should not be overloaded with multiple
>> > > > meanings.  We already have a multiplexed signal mechanism, which is what
>> > > > should be used for any additional signal reasons HS may need to
>> > > > introduce.
>> > >
>> > > It's a revelation to me, but yes, I see it now and agree.
>> > >
>> > > I'm looking at Fujii-san's multiplexing patch from Jul 31 to rewrite
>> > > this code using that mechanism. It sounds like it's a neat fit and it
>> > > should get around the bug report from Kris also if it all works.
>> > Hm. I just read a bit of that multiplexing facility (out of a different reason)
>> > and I have some doubt about it being used unmodified for canceling backends:
>> >
>> > procsignal.c:
>> > /*
>> >  * Note: Since there's no locking, it's possible that the target
>> >  * process detaches from shared memory and exits right after this
>> >  * test, before we set the flag and send signal. And the signal slot
>> >  * might even be recycled by a new process, so it's remotely possible
>> >  * that we set a flag for a wrong process. That's OK, all the signals
>> >  * are such that no harm is done if they're mistakenly fired.
>> >  */
>> > procsignal.h:
>> > ...
>> >  * Also, because of race conditions, it's important that all the signals be
>> >  * defined so that no harm is done if a process mistakenly receives one.
>> >  */
>> >
>> > When cancelling a backend that behaviour could be a bit annoying ;-)
>> >
>> > I guess locking procarray during sending the signal should be enough?
>>
>> I think the idea is that you define the behavior of the signal to be
>> "look at this other piece of state to see whether you should cancel
>> yourself" rather than just "cancel yourself".  Then if a signal is
>> delivered by mistake, it's no big deal - you just look at the other
>> piece of state and decide that you don't need to do anything.
> I dont see an easy way to pass enough information right now. You cant regenerate enough of it in the to be killed backend as most of the relevant information is only available in the startup process.
> Inventing yet another segment in shm just for this seems overcomplicated to me.
> Thats why I suggested locking the procarray for this - without having looked at the code that should prevent a backend slot from beimg reused.

Yeah, I understand, but I have a feeling that the code doesn't do it
that way right now for a reason. Someone who understands this better
than I should comment, but I'm thinking you would likely need to lock
the ProcArray in CheckProcSignal as well, and I'm thinking that can't
be safely done from within a signal handler.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-31 00:17:44 Re: Status of plperl inter-sp calling
Previous Message Magnus Hagander 2009-12-30 23:57:55 Re: krb_server_keyfile setting doesn't work on Windows