Re: Hot Standy introduced problem with query cancel behavior

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Joachim Wieland <joe(at)mcknight(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kris Jurka <books(at)ejurka(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Subject: Re: Hot Standy introduced problem with query cancel behavior
Date: 2010-01-07 15:14:54
Message-ID: 201001071614.55679.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday 07 January 2010 14:45:55 Joachim Wieland wrote:
> On Thu, Dec 31, 2009 at 6:40 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> Building racy infrastructure when it can be avoided with a little care
> >> still seems not to be the best path to me.
> >
> > Doing that will add more complexity in an area that is hard to test
> > effectively. I think the risk of introducing further bugs while trying
> > to fix this rare condition is high. Right now the conflict processing
> > needs more work and is often much less precise than this, so improving
> > this aspect of it would not be a priority. I've added it to the TODO
> > though. Thank you for your research.
> >
> > Patch implements recovery conflict signalling using SIGUSR1
> > multiplexing, then uses a SessionCancelPending mode similar to Joachim's
> > TransactionCancelPending.
>
> I have reworked Simon's patch a bit and attach the result.
>
> Quick facts:
>
> - Hot Standby only uses SIGUSR1
> - SIGINT behaves as it did before: it only cancels running statements
> - pg_cancel_backend() continues to use SIGINT
> - I added pg_cancel_idle_transaction() to cancel an idle transaction via
> SIGUSR1
> - One central function HandleCancelAction() sets the flags before calling
> ProcessInterrupts(), it is called from the different signal handlers and
> receives parameters about what it should do
> - If a SIGUSR1 reason is used that will cancel something, ProcArrayLock is
> acquired until the signal has been sent to make sure that we won't signal
> the wrong backend. Does this sufficiently cover the concerns of Andres
> Freund discussed upthread?
I think it solves the major concern (which I btw could easily reproduce using
software that is in production) but unfortunately not completely.
The avoided situation is:

C(Client): BEGIN; SELECT WHATEVER;
S(Standby): conflict with C
S: Starting to cancel C
C: COMMIT
S: Sending Signal to C
C: Wrong transaction is aborted

The situation not avoided is:
C: BEGIN; SELECT ...
S: conflict with C, lock procarray, sending signal(thats asynchronous), unlock
procarray
C: COMMIT; BEGIN
C: Signal arrives
C: Wrong txn is killled

It should be easy to fix this by having a cancel_localTransactionId field in the
procarray which gets cleaned uppon transaction/backend start and gets checked
in the signal handler (should be casted to sig_atomic_t)

Will cookup a patch if nobody speaks against something like that.

Andres

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Sabino Mullane 2010-01-07 15:17:15 Re: Testing with concurrent sessions
Previous Message David Fetter 2010-01-07 15:11:39 Re: Auto-extending table partitions?