Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers, take 2
Date: 2018-10-03 07:40:11
Message-ID: 153855241179.1560.2858073026056950377.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: not tested
Spec compliant: not tested
Documentation: tested, failed

I am hoping I am not out of order in writing this before the commitfest starts. The patch is big and long and so wanted to start on this while traffic is slow.

I find this patch quite welcome and very close to a minimum viable version. The few significant limitations can be resolved later. One thing I may have missed in the documentation is a discussion of the limits of the current approach. I think this would be important to document because the caveats of the current approach are significant, but the people who need it will have the knowledge to work with issues if they come up.

The major caveat I see in our past discussions and (if I read the patch correctly) is that the resolver goes through global transactions sequentially and does not move on to the next until the previous one is resolved. This means that if I have a global transaction on server A, with foreign servers B and C, and I have another one on server A with foreign servers C and D, if server B goes down at the wrong moment, the background worker does not look like it will detect the failure and move on to try to resolve the second, so server D will have a badly set vacuum horizon until this is resolved. Also if I read the patch correctly, it looks like one can invoke SQL commands to remove the bad transaction to allow processing to continue and manual resolution (this is good and necessary because in this area there is no ability to have perfect recoverability without occasional administrative action). I would really like to see more documentation of failure cases and appropriate administrative action at present. Otherwise this is I think a minimum viable addition and I think we want it.

It is possible i missed that in the documentation. If so, my objection stands aside. If it is welcome I am happy to take a first crack at such docs.

To my mind thats the only blocker in the code (but see below). I can say without a doubt that I would expect we would use this feature once available.

------------------

Testing however failed.

make installcheck-world fails with errors like the following:

-- Modify foreign server and raise an error
BEGIN;
INSERT INTO ft7_twophase VALUES(8);
+ ERROR: prepread foreign transactions are disabled
+ HINT: Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft8_twophase VALUES(NULL); -- violation
! ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft7_twophase;
! ERROR: prepread foreign transactions are disabled
! HINT: Set max_prepared_foreign_transactions to a nonzero value.
SELECT * FROM ft8_twophase;
! ERROR: prepread foreign transactions are disabled
! HINT: Set max_prepared_foreign_transactions to a nonzero value.
-- Rollback foreign transaction that involves both 2PC-capable
-- and 2PC-non-capable foreign servers.
BEGIN;
INSERT INTO ft8_twophase VALUES(7);
+ ERROR: prepread foreign transactions are disabled
+ HINT: Set max_prepared_foreign_transactions to a nonzero value.
INSERT INTO ft9_not_twophase VALUES(7);
+ ERROR: current transaction is aborted, commands ignored until end of transaction block
ROLLBACK;
SELECT * FROM ft8_twophase;
! ERROR: prepread foreign transactions are disabled
! HINT: Set max_prepared_foreign_transactions to a nonzero value.

make installcheck in the contrib directory shows the same, so that's the easiest way of reproducing, at least on a new installation. I think the test cases will have to handle that sort of setup.

make check in the contrib directory passes.

For reasons of test failures, I am setting this back to waiting on author.

------------------
I had a few other thoughts that I figure are worth sharing with the community on this patch with the idea that once it is in place, this may open up more options for collaboration in the area of federated and distributed storage generally. I could imagine other foreign data wrappers using this API, and folks might want to refactor out the atomic handling part so that extensions that do not use the foreign data wrapper structure could use it as well (while this looks like a classic SQL/MED issue, I am not sure that only foreign data wrappers would be interested in the API.

The new status of this patch is: Waiting on Author

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-10-03 07:53:19 Re: Segfault when creating partition with a primary key and sql_drop trigger exists
Previous Message Masahiko Sawada 2018-10-03 07:37:29 Re: pgsql: Improve autovacuum logging for aggressive and anti-wraparound ru