[PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END

From: Dave Sharpe <dave(dot)sharpe(at)lzlabs(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "gilles(at)darold(dot)net" <gilles(at)darold(dot)net>
Subject: [PATCH] Fix CommitTransactionCommand() to CallXactCallbacks() in TBLOCK_ABORT_END
Date: 2020-03-25 02:49:55
Message-ID: A63CC049-ECFB-460F-9B55-BAF14B169774@lzlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi pghackers,
This is my first time posting here ... Gilles Darold and I are developing a new FDW which is based on the contrib/postgres_fdw. The postgres_fdw logic uses a RegisterXactCallback function to send local transactions remote. But I found that a registered XactCallback is not always called for a successful client ROLLBACK statement. So, a successful local ROLLBACK does not get sent remote by FDW, and now the local and remote transaction states are messed up, out of sync. The local database is "eating" the successful rollback.

I attach a git format-patch file 0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch
The patch fixes the problem and is ready to commit as far as I can tell. It adds some comment lines and three lines of code to CommitTransactionCommand() in the TBLOCK_ABORT_END case. Line (1) to reset the transaction's blockState back to TBLOCK_ABORT, ahead of (2) a new call to callXactCallbacks(). If the callback returns successful (which is usually the case), (3) the new code switches back to TBLOCK_ABORT_END, then continues with old code to CleanupTransaction() as before. If any callback does error out, the TBLOCK_ABORT was the correct block state for an error.

I have not added a regression test since my scenario involves a C module ... I didn't see such a regression test, but somebody can teach me where to put the C module and Makefile if a regression test should be added. Heads up that the scenario to hit this is a bit complex (to me) ... I attach the following unit test files:
1) eat_rollback.c, _PG_init() installs my_utility_hook() for INFO log for debugging.
RegisterSubXactCallback(mySubtransactionCallback) which injects some logging and an ERROR for savepoints, i.e., negative testing, e.g., like a remote FDW savepoint failed.
And RegisterXactTransaction(myTransactionCallback) with info logging.
2) Makefile, to make the eat_rollback.so
3) eat_rollback2.sql, drives the fail sequence, especially the SAVEPOINT, which errors out, then later a successful ROLLBACK gets incorrectly eaten (no CALLBACK info logging, demonstrates the bug), then another successful ROLLBACK (now there is CALLBACK info logging).
4) eat_rollback2.out, output without the fix, the rollback at line 68 is successful but there is not myTransactionCallback() INFO output
5) eat_rollback2.fixed, output after the fix, the rollback at line 68 is successful, and now there is a myTransactionCallback() INFO log. Success!

It first failed for me in v11.1, I suspect it failed since before then too. And it is still failing in current master.

Bye the way, we worked around the bug in our FDW by handling sub/xact in the utility hook. A transaction callback is still needed for implicit, internal rollbacks. Another advantage of the workaround is (I think) it reduces the code complexity and improves performance because the entire subxact stack is not unwound to drive each and every subtransaction level to remote. Also sub/transaction statements are sent remote as they arrive (local and remote are more "transactionally" synced, not stacked by FDW for later). And of course, the workaround doesn't hit the above bug, since our utility hook correctly handles the client ROLLBACK. If it makes sense to the community, I could try and patch contrib/postgres_fdw to do what we are doing. But note that I don't need it myself: we have our own new FDW for remote DB2 z/OS (!) ... at LzLabs we are a building a software defined mainframe with PostgreSQL (of all things).

Hope it helps!

Dave Sharpe
I don't necessarily agree with everything I say. (MM)
www.lzlabs.com

Attachment Content-Type Size
0001-Fix-CommitTransactionCommand-to-CallXactCallbacks-in.patch application/octet-stream 1.2 KB
eat_rollback.c application/octet-stream 2.9 KB
Makefile application/octet-stream 368 bytes
eat_rollback2.sql application/octet-stream 1.3 KB
eat_rollback2.out application/octet-stream 4.9 KB
eat_rollback2.fixed application/octet-stream 5.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2020-03-25 03:02:39 Re: improve transparency of bitmap-only heap scans
Previous Message Amit Langote 2020-03-25 02:36:52 Re: Run-time pruning for ModifyTable