Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

From: Gilles Darold <gilles(at)darold(dot)net>
To: Dave Sharpe <dave(dot)sharpe(at)lzlabs(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "gilles(at)darold(dot)net" <gilles(at)darold(dot)net>
Subject: Re: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END
Date: 2020-03-26 16:09:04
Message-ID: 3f50f63e-1ba7-d729-d016-a06afc0d167b@darold.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le 25/03/2020 à 03:49, Dave Sharpe a écrit :
>
> Hi pghackers,
>
> This is my first time posting here ...  Gilles Darold and I are
> developing a new FDW which is based on the contrib/postgres_fdw. The
> postgres_fdw logic uses a RegisterXactCallback function to send local
> transactions remote. But I found that a registered XactCallback is not
> always called for a successful client ROLLBACK statement. So, a
> successful local ROLLBACK does not get sent remote by FDW, and now the
> local and remote transaction states are messed up, out of sync. The
> local database is "eating" the successful rollback.
>
>  
>
> I attach a git format-patch file
> 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
>
> The patch fixes the problem and is ready to commit as far as I can
> tell. It adds some comment lines and three lines of code to
> CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to
> reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2)
> a new call to callXactCallbacks(). If the callback returns successful
> (which is usually the case), (3) the new code switches back to
> TBLOCK_ABORT_END, then continues with old code to CleanupTransaction()
> as before. If any callback does error out, the TBLOCK_ABORT was the
> correct block state for an error.
>
>  
>
> I have not added a regression test since my scenario involves a C
> module ... I didn't see such a regression test, but somebody can teach
> me where to put the C module and Makefile if a regression test should
> be added. Heads up that the scenario to hit this is a bit complex (to
> me) ... I attach the following unit test files:
>
> 1) eat_rollback.c, _/PG_init() installs my/_utility_hook() for INFO
> log for debugging.
>
> RegisterSubXactCallback(mySubtransactionCallback) which injects some
> logging and an ERROR for savepoints, i.e., negative testing, e.g.,
> like a remote FDW savepoint failed.
>
> And RegisterXactTransaction(myTransactionCallback) with info logging.
>
> 2) Makefile, to make the eat_rollback.so
>
> 3) eat_rollback2.sql, drives the fail sequence, especially the
> SAVEPOINT, which errors out, then later a successful ROLLBACK gets
> incorrectly eaten (no CALLBACK info logging, demonstrates the bug),
> then another successful ROLLBACK (now there is CALLBACK info logging).
>
> 4) eat_rollback2.out, output without the fix, the rollback at line 68
> is successful but there is not myTransactionCallback() INFO output
>
> 5) eat_rollback2.fixed, output after the fix, the rollback at line 68
> is successful, and now there is a myTransactionCallback() INFO log.
> Success!
>
>  
>
> It first failed for me in v11.1, I suspect it failed since before then
> too. And it is still failing in current master.
>
>  
>
> Bye the way, we worked around the bug in our FDW by handling sub/xact
> in the utility hook. A transaction callback is still needed for
> implicit, internal rollbacks. Another advantage of the workaround is
> (I think) it reduces the code complexity and improves performance
> because the entire subxact stack is not unwound to drive each and
> every subtransaction level to remote. Also sub/transaction statements
> are sent remote as they arrive (local and remote are more
> "transactionally" synced, not stacked by FDW for later). And of
> course, the workaround doesn't hit the above bug, since our utility
> hook correctly handles the client ROLLBACK. If it makes sense to the
> community, I could try and patch contrib/postgres_fdw to do what we
> are doing. But note that I don't need it myself: we have our own new
> FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software
> defined mainframe with PostgreSQL (of all things).
>
>  
>
> Hope it helps!
>
>  
>
> Dave Sharpe
>
> /I don't necessarily agree with everything I say./(MM)
>
> www.lzlabs.com
>

Hi,

As I'm involved in this thread I have given a review to this bug report
and I don't think that there is a problem here but as a disclaimer my
knowledge on internal transaction management is probably not enough to
have a definitive opinion.

Actually the callback function is called when the error is thrown:

psql:eat_rollback2.sql:20: INFO:  00000: myTransactionCallback()
XactEvent 2 (is abort) level 1 <-------------------
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback2.sql:20: ERROR:  XX000: no no no
LOCATION:  mySubtransactionCallback, eat_rollback.c:65

this is probably why the callback is not called on the subsequent
ROLLBACK execution because abort processing is already done
(src/backend/access/transam/xact.c:3890).

With this simple test and the debug extension:

BEGIN;
DROP INDEX "index2", "index3"; -- will fail indexes doesn't exist
ROLLBACK;

the callback function is also called at error level, not on the ROLLBACK:

---------------------------------------------------------------------------------BEGIN
psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO:  00000:
my_utility_hook() utility transaction kind 0 (is not rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
BEGIN
---------------------------------------------------------------------------------DROP
INDEX + error
psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO:  00000:
myTransactionCallback() XactEvent 2 (is abort) level 1 <---------------
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR:  42704: index
"concur_index2" does not exist
LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1209
---------------------------------------------------------------------------------ROLLBACK
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000:
my_utility_hook() utility transaction kind 3 (is rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
ROLLBACK

So this is probably the expected behavior , however the proposed patch
will throw the callback function two times which might not be what we want.

---------------------------------------------------------------------------------BEGIN
psql:eat_rollback-2/sql/eat_rollback3.sql:39: INFO:  00000:
my_utility_hook() utility transaction kind 0 (is not rollback)
LOCATION:  my_utility_hook, eat_rollback.c:28
BEGIN
---------------------------------------------------------------------------------DROP
INDEX + error
psql:eat_rollback-2/sql/eat_rollback3.sql:41: INFO:  00000:
myTransactionCallback() XactEvent 2 (is abort) level 1
<-------------- first call
LOCATION:  myTransactionCallback, eat_rollback.c:52
psql:eat_rollback-2/sql/eat_rollback3.sql:41: ERROR:  42704: index
"concur_index2" does not exist
LOCATION:  DropErrorMsgNonExistent, tablecmds.c:1209
---------------------------------------------------------------------------------ROLLBACK
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000:
my_utility_hook() utility transaction kind 3 (is
rollback)<-------------- second call
LOCATION:  my_utility_hook, eat_rollback.c:28
psql:eat_rollback-2/sql/eat_rollback3.sql:43: INFO:  00000:
myTransactionCallback() XactEvent 2 (is abort) level 1
LOCATION:  myTransactionCallback, eat_rollback.c:52
ROLLBACK

But again I may be wrong.

--
Gilles Darold
http://www.darold.net/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2020-03-26 16:34:52 Re: backup manifests
Previous Message Konstantin Knizhnik 2020-03-26 15:49:51 Re: Columns correlation and adaptive query optimization