Re: Hot Standy introduced problem with query cancel behavior

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joachim Wieland <joe(at)mcknight(dot)de>, Kris Jurka <books(at)ejurka(dot)com>
Subject: Re: Hot Standy introduced problem with query cancel behavior
Date: 2010-01-12 08:40:03
Message-ID: 1263285603.19367.171544.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2010-01-12 at 06:30 +0100, Andres Freund wrote:
> On 01/07/10 22:37, Andres Freund wrote:
> > On Thursday 07 January 2010 22:28:46 Tom Lane wrote:
> >> Andres Freund<andres(at)anarazel(dot)de> writes:
> >>> I did not want to suggest using Simons code there. Sorry for the brevity.
> >>> should have read as "revert to old code and add two step killing (thats
> >>> likely around 10 lines or so)".
> >>>
> >>> "two step killing" meaning that we signal ERROR for a few times and if
> >>> nothing happens that we like, we signal FATAL.
> >>> As the code already loops around signaling anyway that should be easy to
> >>> implement.
> >> Ah. This loop happens in the process that's trying to send the cancel
> >> signal, correct, not the one that needs to respond to it? That sounds
> >> fairly sane to me.
> > Yes.
> >
> >
> >> There are some things we could do to make it more likely that a cancel
> >> of this type is accepted --- for instance, give it a distinct SQLSTATE
> >> code that *can not* be trapped by plpgsql EXCEPTION blocks --- but there
> >> is no practical way to guarantee it except elog(FATAL). I'm not
> >> entirely convinced that an untrappable error would be a good thing
> >> anyway; it's hard to argue that that's much better than a FATAL.
> > Well a session which is usable after a transaction abort is quite sensible -
> > quite some software I know handles a failing transaction much more gracefully
> > than a session abort (e.g. because it has to deal with serialization failures
> > and such anyway).
> >
> > So making it cought by fewer places and degrading to FATAL sound sensible and
> > relatively easy to me.
> > Unless somebody disagrees I will give it a shot.
> Ok, here is a stab at that:
>
> 1. Allow the signal multiplexing facility to transfer one sig_atomic_t
> worth of data. This is usefull e.g. for making operations idempotent
> or more precise.
>
> In this the LocalBackendId is transported - that should make it
> impossible to cancel the wrong transaction

This doesn't remove the possibility of cancelling the wrong transaction,
it just reduces the chances. You can still cancel a session with
LocalBackendId == 1 and then have it accidentally cancel the next
session with the same backendid. That's why I didn't chase this idea
myself earlier.

Closing that loophole isn't a priority and is best left until we have
the other things have been cleaned up.

> 2.
>
> AbortTransactionAndAnySubtransactions is only used in the mainloops
> error handler as it should be unproblematic there.
>
> In the current CVS code ConditionalVirtualXactLockTableWait() in
> ResolveRecoveryConflictWithVirtualXIDs does the wait for every try of
> cancelling the other transaction.
>
> I moved the retry logic into CancelVirtualTransaction(). If 50 times a
> ERROR does nothing it degrades to FATAL

It's a good idea but not the right one, IMHO. I'd rather that the
backend decided whether the instruction results in an ERROR or a FATAL
based upon its own state, rather than have Startup process bang its head
against the wall 50 times without knowing why.

> XXX: I temporarily do not use the skipExistingConflicts argument of
> GetConflictingVirtualXIDs - I dont understand its purpose and a bit of
> infrastructure is missing right now as the recoveryConflictMode is not
> stored anymore anywhere. Can easily be added back though.

> 3.
> Add a new error code ERRCODE_QUERY_CANCELED_HS for use with HS
> indicating a failure that is more than a plain
> ERRCODE_QUERY_CANCELED - namely it should not be caught from
> various places like savepoints and in PLs.
>
> Exemplarily I checked for that error code in plpgsql and make it
> uncatcheable.
>
> I am not sure how new errorcode codes get chosen though - and the name
> is not that nice.
>
> Opinions on that?

I don't trust it yet, as a mechanism. Changing only PL/pgSQL seems
fairly broken also. Again, this seems like something that be handled
separately.

> I copied quite a bit from Simons earlier patch...

It changes a few things around and adds the ideas you've mentioned,
though seeing them as code doesn't in itself move the discussion
forwards. There are far too many new ideas in this patch for it to be
even close to acceptable, to me. Yes, lets discuss these additional
ideas, but after a basic patch has been applied.

I do value your interest and input, though racing me to rework my own
patch, then asking me to review it seems like wasted time for both of
us. I will post a new patch on ~Friday.

(Also, please make your braces { } follow the normal coding conventions
for Postgres.)

> Currently the patch does not yet do anything to avoid letting the
> protocol out of sync. What do you think about adding a flag for error
> codes not to communicate with the client (Similarly to COMERROR)?
>
> So that one could do an elog(ERROR & ERROR_NO_SEND_CLIENT, .. or such?

Seems fairly important piece.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-01-12 09:05:56 Re: mailing list archiver chewing patches
Previous Message Pavel Stehule 2010-01-12 08:08:25 Re: planner or statistical bug on 8.5