From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | Joachim Wieland <joe(at)mcknight(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kris Jurka <books(at)ejurka(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Subject: | Re: Hot Standy introduced problem with query cancel behavior |
Date: | 2010-01-07 15:14:54 |
Message-ID: | 201001071614.55679.andres@anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thursday 07 January 2010 14:45:55 Joachim Wieland wrote:
> On Thu, Dec 31, 2009 at 6:40 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> >> Building racy infrastructure when it can be avoided with a little care
> >> still seems not to be the best path to me.
> >
> > Doing that will add more complexity in an area that is hard to test
> > effectively. I think the risk of introducing further bugs while trying
> > to fix this rare condition is high. Right now the conflict processing
> > needs more work and is often much less precise than this, so improving
> > this aspect of it would not be a priority. I've added it to the TODO
> > though. Thank you for your research.
> >
> > Patch implements recovery conflict signalling using SIGUSR1
> > multiplexing, then uses a SessionCancelPending mode similar to Joachim's
> > TransactionCancelPending.
>
> I have reworked Simon's patch a bit and attach the result.
>
> Quick facts:
>
> - Hot Standby only uses SIGUSR1
> - SIGINT behaves as it did before: it only cancels running statements
> - pg_cancel_backend() continues to use SIGINT
> - I added pg_cancel_idle_transaction() to cancel an idle transaction via
> SIGUSR1
> - One central function HandleCancelAction() sets the flags before calling
> ProcessInterrupts(), it is called from the different signal handlers and
> receives parameters about what it should do
> - If a SIGUSR1 reason is used that will cancel something, ProcArrayLock is
> acquired until the signal has been sent to make sure that we won't signal
> the wrong backend. Does this sufficiently cover the concerns of Andres
> Freund discussed upthread?
I think it solves the major concern (which I btw could easily reproduce using
software that is in production) but unfortunately not completely.
The avoided situation is:
C(Client): BEGIN; SELECT WHATEVER;
S(Standby): conflict with C
S: Starting to cancel C
C: COMMIT
S: Sending Signal to C
C: Wrong transaction is aborted
The situation not avoided is:
C: BEGIN; SELECT ...
S: conflict with C, lock procarray, sending signal(thats asynchronous), unlock
procarray
C: COMMIT; BEGIN
C: Signal arrives
C: Wrong txn is killled
It should be easy to fix this by having a cancel_localTransactionId field in the
procarray which gets cleaned uppon transaction/backend start and gets checked
in the signal handler (should be casted to sig_atomic_t)
Will cookup a patch if nobody speaks against something like that.
Andres
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Sabino Mullane | 2010-01-07 15:17:15 | Re: Testing with concurrent sessions |
Previous Message | David Fetter | 2010-01-07 15:11:39 | Re: Auto-extending table partitions? |