Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Muhammad Usama <m(dot)usama(at)gmail(dot)com>
Cc: amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-04-30 11:43:23
Message-ID: CA+fd4k62Qy+AUkmXa=0gcTCYk06=Rcyyg9fyNxHr2DCB-=aegw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 28 Apr 2020 at 19:37, Muhammad Usama <m(dot)usama(at)gmail(dot)com> wrote:
>
>
>
> On Wed, Apr 8, 2020 at 11:16 AM Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> wrote:
>>
>> On Fri, 27 Mar 2020 at 22:06, Muhammad Usama <m(dot)usama(at)gmail(dot)com> wrote:
>> >
>> > Hi Sawada San,
>> >
>> > I have been further reviewing and testing the transaction involving multiple server patches.
>> > Overall the patches are working as expected bar a few important exceptions.
>> > So as discussed over the call I have fixed the issues I found during the testing
>> > and also rebased the patches with the current head of the master branch.
>> > So can you please have a look at the attached updated patches.
>>
>> Thank you for reviewing and updating the patch!
>>
>> >
>> > Below is the list of changes I have made on top of V18 patches.
>> >
>> > 1- In register_fdwxact(), As we are just storing the callback function pointers from
>> > FdwRoutine in fdw_part structure, So I think we can avoid calling
>> > GetFdwRoutineByServerId() in TopMemoryContext.
>> > So I have moved the MemoryContextSwitch to TopMemoryContext after the
>> > GetFdwRoutineByServerId() call.
>>
>> Agreed.
>>
>> >
>> >
>> > 2- If PrepareForeignTransaction functionality is not present in some FDW then
>> > during the registration process we should only set the XACT_FLAGS_FDWNOPREPARE
>> > transaction flag if the modified flag is also set for that server. As for the server that has
>> > not done any data modification within the transaction we do not do two-phase commit anyway.
>>
>> Agreed.
>>
>> >
>> > 3- I have moved the foreign_twophase_commit in sample file after
>> > max_foreign_transaction_resolvers because the default value of max_foreign_transaction_resolvers
>> > is 0 and enabling the foreign_twophase_commit produces an error with default
>> > configuration parameter positioning in postgresql.conf
>> > Also, foreign_twophase_commit configuration was missing the comments
>> > about allowed values in the sample config file.
>>
>> Sounds good. Agreed.
>>
>> >
>> > 4- Setting ForeignTwophaseCommitIsRequired in is_foreign_twophase_commit_required()
>> > function does not seem to be the correct place. The reason being, even when
>> > is_foreign_twophase_commit_required() returns true after setting ForeignTwophaseCommitIsRequired
>> > to true, we could still end up not using the two-phase commit in the case when some server does
>> > not support two-phase commit and foreign_twophase_commit is set to FOREIGN_TWOPHASE_COMMIT_PREFER
>> > mode. So I have moved the ForeignTwophaseCommitIsRequired assignment to PreCommit_FdwXacts()
>> > function after doing the prepare transaction.
>>
>> Agreed.
>>
>> >
>> > 6- In prefer mode, we commit the transaction in single-phase if the server does not support
>> > the two-phase commit. But instead of doing the single-phase commit right away,
>> > IMHO the better way is to wait until all the two-phase transactions are successfully prepared
>> > on servers that support the two-phase. Since an error during a "PREPARE" stage would
>> > rollback the transaction and in that case, we would end up with committed transactions on
>> > the server that lacks the support of the two-phase commit.
>>
>> When an error occurred before the local commit, a 2pc-unsupported
>> server could be rolled back or committed depending on the error
>> timing. On the other hand all 2pc-supported servers are always rolled
>> back when an error occurred before the local commit. Therefore even if
>> we change the order of COMMIT and PREPARE it is still possible that we
>> will end up committing the part of 2pc-unsupported servers while
>> rolling back others including 2pc-supported servers.
>>
>> I guess the motivation of your change is that since errors are likely
>> to happen during executing PREPARE on foreign servers, we can minimize
>> the possibility of rolling back 2pc-unsupported servers by deferring
>> the commit of 2pc-unsupported server as much as possible. Is that
>> right?
>
>
> Yes, that is correct. The idea of doing the COMMIT on NON-2pc-supported servers
> after all the PREPAREs are successful is to minimize the chances of partial commits.
> And as you mentioned there will still be chances of getting a partial commit even with
> this approach but the probability of that would be less than what it is with the
> current sequence.
>
>
>>
>>
>> > So I have modified the flow a little bit and instead of doing a one-phase commit right away
>> > the servers that do not support a two-phase commit is added to another list and that list is
>> > processed after once we have successfully prepared all the transactions on two-phase supported
>> > foreign servers. Although this technique is also not bulletproof, still it is better than doing
>> > the one-phase commits before doing the PREPAREs.
>>
>> Hmm the current logic seems complex. Maybe we can just reverse the
>> order of COMMIT and PREPARE; do PREPARE on all 2pc-supported and
>> modified servers first and then do COMMIT on others?
>
>
> Agreed, seems reasonable.
>>
>>
>> >
>> > Also, I think we can improve on this one by throwing an error even in PREFER
>> > mode if there is more than one server that had data modified within the transaction
>> > and lacks the two-phase commit support.
>> >
>>
>> IIUC the concept of PREFER mode is that the transaction uses 2pc only
>> for 2pc-supported servers. IOW, even if the transaction modifies on a
>> 2pc-unsupported server we can proceed with the commit if in PREFER
>> mode, which cannot if in REQUIRED mode. What is the motivation of your
>> above idea?
>
>
> I was thinking that we could change the behavior of PREFER mode such that we only allow
> to COMMIT the transaction if the transaction needs to do a single-phase commit on one
> server only. That way we can ensure that we would never end up with partial commit.
>

I think it's good to avoid a partial commit by using your idea but if
we want to avoid a partial commit we can use the 'required' mode,
which requires all participant servers to support 2pc. We throw an
error if participant servers include even one 2pc-unsupported server
is modified within the transaction. Of course if the participant node
is only one 2pc-unsupported server it can use 1pc even in the
'required' mode.

> One Idea in this regards would be to switch the local transaction to commit using 2pc
> if there is a total of only one foreign server that does not support the 2pc in the transaction,
> ensuring that 1-pc commit servers should always be less than or equal to 1. and if there are more
> than one foreign server requires 1-pc then we just throw an error.

I might be missing your point but I suppose this idea is to do
something like the following?

1. prepare the local transaction
2. commit the foreign transaction on 2pc-unsupported server
3. commit the prepared local transaction

>
> However having said that, I am not 100% sure if its a good or an acceptable Idea, and
> I am okay with continuing with the current behavior of PREFER mode if we put it in the
> document that this mode can cause a partial commit.

There will three types of servers: (a) a server doesn't support any
transaction API, (b) a server supports only commit and rollback API
and (c) a server supports all APIs (commit, rollback and prepare).
Currently postgres transaction manager manages only server-(b) and
server-(c), adds them to FdwXactParticipants. I'm considering changing
the code so that it adds also server-(a) to FdwXactParticipants, in
order to track the number of server-(a) involved in the transaction.
But it doesn't insert FdwXact entry for it, and manage transactions on
these servers.

The reason is this; if we want to have the 'required' mode strictly
require all participant servers to support 2pc, we should use 2pc when
(# of server-(a) + # of server-(b) + # of server-(c)) >= 2. But since
currently we just track the modification on a server-(a) by a flag we
cannot handle the case where two server-(a) are modified in the
transaction. On the other hand, if we don't consider server-(a) the
transaction could end up with a partial commit when a server-(a)
participates in the transaction. Therefore I'm thinking of the above
change so that the transaction manager can ensure that a partial
commit doesn't happen in the 'required' mode. What do you think?

>
>>
>> > 7- Added a pfree() and list_free_deep() in PreCommit_FdwXacts() to reclaim the
>> > memory if fdw_part is removed from the list
>>
>> I think at the end of the transaction we free entries of
>> FdwXactParticipants list and set FdwXactParticipants to NIL. Why do we
>> need to do that in PreCommit_FdwXacts()?
>
>
> Correct me if I am wrong, The fdw_part structures are created in TopMemoryContext
> and if that fdw_part structure is removed from the list at pre_commit stage
> (because we did 1-PC COMMIT on it) then it would leak memory.

The fdw_part structures are created in TopTransactionContext so these
are freed at the end of the transaction.

>
>>
>> >
>> > 8- The function FdwXactWaitToBeResolved() was bailing out as soon as it finds
>> > (FdwXactParticipants == NIL). The problem with that was in the case of
>> > "COMMIT/ROLLBACK PREPARED" we always get FdwXactParticipants = NIL and
>> > effectively the foreign prepared transactions(if any) associated with locally
>> > prepared transactions were never getting resolved automatically.
>> >
>> >
>> > postgres=# BEGIN;
>> > BEGIN
>> > INSERT INTO test_local VALUES ( 2, 'TWO');
>> > INSERT 0 1
>> > INSERT INTO test_foreign_s1 VALUES ( 2, 'TWO');
>> > INSERT 0 1
>> > INSERT INTO test_foreign_s2 VALUES ( 2, 'TWO');
>> > INSERT 0 1
>> > postgres=*# PREPARE TRANSACTION 'local_prepared';
>> > PREPARE TRANSACTION
>> >
>> > postgres=# select * from pg_foreign_xacts ;
>> > dbid | xid | serverid | userid | status | in_doubt | identifier
>> > -------+-----+----------+--------+----------+----------+----------------------------
>> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
>> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
>> > (2 rows)
>> >
>> > -- Now commit the prepared transaction
>> >
>> > postgres=# COMMIT PREPARED 'local_prepared';
>> >
>> > COMMIT PREPARED
>> >
>> > --Foreign prepared transactions associated with 'local_prepared' not resolved
>> >
>> > postgres=#
>> >
>> > postgres=# select * from pg_foreign_xacts ;
>> > dbid | xid | serverid | userid | status | in_doubt | identifier
>> > -------+-----+----------+--------+----------+----------+----------------------------
>> > 12929 | 515 | 16389 | 10 | prepared | f | fx_1339567411_515_16389_10
>> > 12929 | 515 | 16391 | 10 | prepared | f | fx_1963224020_515_16391_10
>> > (2 rows)
>> >
>> >
>> > So to fix this in case of the two-phase transaction, the function checks the existence
>> > of associated foreign prepared transactions before bailing out.
>> >
>>
>> Good catch. But looking at your change, we should not accept the case
>> where FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) ==
>> false.
>>
>> if (FdwXactParticipants == NIL)
>> {
>> /*
>> * If we are here because of COMMIT/ROLLBACK PREPARED then the
>> * FdwXactParticipants list would be empty. So we need to
>> * see if there are any foreign prepared transactions exists
>> * for this prepared transaction
>> */
>> if (TwoPhaseExists(wait_xid))
>> {
>> List *foreign_trans = NIL;
>>
>> foreign_trans = get_fdwxacts(MyDatabaseId,
>> wait_xid, InvalidOid, InvalidOid,
>> false, false, true);
>>
>> if (foreign_trans == NIL)
>> return;
>> list_free(foreign_trans);
>> }
>> }
>>
>
> Sorry my bad, its a mistake on my part. we should just return from the function when
> FdwXactParticipants == NULL but TwoPhaseExists(wait_xid) == false.
>
> if (TwoPhaseExists(wait_xid))
> {
> List *foreign_trans = NIL;
> foreign_trans = get_fdwxacts(MyDatabaseId, wait_xid, InvalidOid, InvalidOid,
> false, false, true);
>
> if (foreign_trans == NIL)
> return;
> list_free(foreign_trans);
> }
> else
> return;
>
>>
>> > 9- In function XlogReadFdwXactData() XLogBeginRead call was missing before XLogReadRecord()
>> > that was causing the crash during recovery.
>>
>> Agreed.
>>
>> >
>> > 10- incorporated set_ps_display() signature change.
>>
>> Thanks.
>>
>> Regarding other changes you did in v19 patch, I have some comments:
>>
>> 1.
>> + ereport(LOG,
>> + (errmsg("trying to %s the foreign transaction
>> associated with transaction %u on server %u",
>> + fdwxact->status ==
>> FDWXACT_STATUS_COMMITTING?"COMMIT":"ABORT",
>> + fdwxact->local_xid,
>> fdwxact->serverid)));
>> +
>>
>> Why do we need to emit LOG message in pg_resolve_foreign_xact() SQL function?
>
>
> That change was not intended to get into the patch file. I had done it during testing to
> quickly get info on which way the transaction is going to be resolved.
>
>>
>> 2.
>> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
>> deleted file mode 120000
>> index ce8c21880c..0000000000
>> --- a/src/bin/pg_waldump/fdwxactdesc.c
>> +++ /dev/null
>> @@ -1 +0,0 @@
>> -../../../src/backend/access/rmgrdesc/fdwxactdesc.c
>> \ No newline at end of file
>> diff --git a/src/bin/pg_waldump/fdwxactdesc.c b/src/bin/pg_waldump/fdwxactdesc.c
>> new file mode 100644
>> index 0000000000..ce8c21880c
>> --- /dev/null
>> +++ b/src/bin/pg_waldump/fdwxactdesc.c
>> @@ -0,0 +1 @@
>> +../../../src/backend/access/rmgrdesc/fdwxactdesc.c
>>
>> We need to remove src/bin/pg_waldump/fdwxactdesc.c from the patch.
>
>
> Again sorry! that was an oversight on my part.
>
>>
>> 3.
>> --- a/doc/src/sgml/monitoring.sgml
>> +++ b/doc/src/sgml/monitoring.sgml
>> @@ -1526,14 +1526,14 @@ postgres 27093 0.0 0.0 30096 2752 ?
>> Ss 11:34 0:00 postgres: ser
>> <entry><literal>SafeSnapshot</literal></entry>
>> <entry>Waiting for a snapshot for a <literal>READ ONLY
>> DEFERRABLE</literal> transaction.</entry>
>> </row>
>> - <row>
>> - <entry><literal>SyncRep</literal></entry>
>> - <entry>Waiting for confirmation from remote server during
>> synchronous replication.</entry>
>> - </row>
>> <row>
>> <entry><literal>FdwXactResolution</literal></entry>
>> <entry>Waiting for all foreign transaction participants to
>> be resolved during atomic commit among foreign servers.</entry>
>> </row>
>> + <row>
>> + <entry><literal>SyncRep</literal></entry>
>> + <entry>Waiting for confirmation from remote server during
>> synchronous replication.</entry>
>> + </row>
>> <row>
>> <entry morerows="4"><literal>Timeout</literal></entry>
>> <entry><literal>BaseBackupThrottle</literal></entry>
>>
>> We need to move the entry of FdwXactResolution to right before
>> Hash/Batch/Allocating for alphabetical order.
>
>
> Agreed!
>>
>>
>> I've incorporated your changes I agreed with to my local branch and
>> will incorporate other changes after discussion. I'll also do more
>> test and self-review and will submit the latest version patch.
>>
>
> Meanwhile, I found a couple of more small issues, One is the break statement missing
> i n pgstat_get_wait_ipc() and secondly fdwxact_relaunch_resolvers()
> could return un-initialized value.
> I am attaching a small patch for these changes that can be applied on top of existing
> patches.

Thank you for the patch!

I'm updating the patches because current behavior in error case would
not be good. For example, when an error occurs in the prepare phase,
prepared transactions are left as in-doubt transaction. And these
transactions are not handled by the resolver process. That means that
a user could need to resolve these transactions manually every abort
time, which is not good. In abort case, I think that prepared
transactions can be resolved by the backend itself, rather than
leaving them for the resolver. I'll submit the updated patch.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Wagner 2020-04-30 12:06:08 Postgres Windows build system doesn't work with python installed in Program Files
Previous Message Amit Kapila 2020-04-30 11:12:02 Re: Optimization for hot standby XLOG_STANDBY_LOCK redo