Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: Muhammad Usama <m(dot)usama(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-08-28 08:49:59
Message-ID: 75210c4351f6c9e4693e1d9ce78749fe@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 2020-07-17 15:55, Masahiko Sawada wrote:
>> On Fri, 17 Jul 2020 at 11:06, Masahiro Ikeda
>> <ikedamsh(at)oss(dot)nttdata(dot)com>
>> wrote:
>>>
>>> On 2020-07-16 13:16, Masahiko Sawada wrote:
>>>> On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda
>>>> <ikedamsh(at)oss(dot)nttdata(dot)com>
>>>> wrote:
>>>>>
>>>>>> I've attached the latest version patches. I've incorporated the
>>>>>> review
>>>>>> comments I got so far and improved locking strategy.
>>>>>
>>>>> I want to ask a question about streaming replication with 2PC.
>>>>> Are you going to support 2PC with streaming replication?
>>>>>
>>>>> I tried streaming replication using v23 patches.
>>>>> I confirm that 2PC works with streaming replication,
>>>>> which there are primary/standby coordinator.
>>>>>
>>>>> But, in my understanding, the WAL of "PREPARE" and
>>>>> "COMMIT/ABORT PREPARED" can't be replicated to the standby server
>>>>> in
>>>>> sync.
>>>>>
>>>>> If this is right, the unresolved transaction can be occurred.
>>>>>
>>>>> For example,
>>>>>
>>>>> 1. PREPARE is done
>>>>> 2. crash primary before the WAL related to PREPARE is
>>>>> replicated to the standby server
>>>>> 3. promote standby server // but can't execute "ABORT PREPARED"
>>>>>
>>>>> In above case, the remote server has the unresolved transaction.
>>>>> Can we solve this problem to support in-sync replication?
>>>>>
>>>>> But, I think some users use async replication for performance.
>>>>> Do we need to document the limitation or make another solution?
>>>>>
>>>>
>>>> IIUC with synchronous replication, we can guarantee that WAL records
>>>> are written on both primary and replicas when the client got an
>>>> acknowledgment of commit. We don't replicate each WAL records
>>>> generated during transaction one by one in sync. In the case you
>>>> described, the client will get an error due to the server crash.
>>>> Therefore I think the user cannot expect WAL records generated so
>>>> far
>>>> has been replicated. The same issue could happen also when the user
>>>> executes PREPARE TRANSACTION and the server crashes.
>>>
>>> Thanks! I didn't noticed the behavior when a user executes PREPARE
>>> TRANSACTION is same.
>>>
>>> IIUC with 2PC, there is a different point between (1)PREPARE
>>> TRANSACTION
>>> and (2)2PC.
>>> The point is that whether the client can know when the server crashed
>>> and it's global tx id.
>>>
>>> If (1)PREPARE TRANSACTION is failed, it's ok the client execute same
>>> command
>>> because if the remote server is already prepared the command will be
>>> ignored.
>>>
>>> But, if (2)2PC is failed with coordinator crash, the client can't
>>> know
>>> what operations should be done.
>>>
>>> If the old coordinator already executed PREPARED, there are some
>>> transaction which should be ABORT PREPARED.
>>> But if the PREPARED WAL is not sent to the standby, the new
>>> coordinator
>>> can't execute ABORT PREPARED.
>>> And the client can't know which remote servers have PREPARED
>>> transactions which should be ABORTED either.
>>>
>>> Even if the client can know that, only the old coordinator knows its
>>> global transaction id.
>>> Only the database administrator can analyze the old coordinator's log
>>> and then execute the appropriate commands manually, right?
>>
>> I think that's right. In the case of the coordinator crash, the user
>> can look orphaned foreign prepared transactions by checking the
>> 'identifier' column of pg_foreign_xacts on the new standby server and
>> the prepared transactions on the remote servers.
>>
> I think there is a case we can't check orphaned foreign
> prepared transaction in pg_foreign_xacts view on the new standby
> server.
> It confuses users and database administrators.
>
> If the primary coordinator crashes after preparing foreign transaction,
> but before sending XLOG_FDWXACT_INSERT records to the standby server,
> the standby server can't restore their transaction status and
> pg_foreign_xacts view doesn't show the prepared foreign transactions.
>
> To send XLOG_FDWXACT_INSERT records asynchronously leads this problem.

If the primary replicates XLOG_FDWXACT_INSERT to the standby
asynchronously,
some prepared transaction may be unsolved forever.

Since I think to solve this inconsistency manually is hard operation,
we need to support synchronous XLOG_FDWXACT_INSERT replication.

I understood that there are a lot of impact to the performance,
but users can control the consistency/durability vs performance
with synchronous_commit parameter.

What do you think?

> Thank you for letting me know. I've attached the latest version patch
> set.

Thanks for updating.
But, the latest patches failed to be applied to the master branch.

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2020-08-28 09:29:16 RE: extension patch of CREATE OR REPLACE TRIGGER
Previous Message Dilip Kumar 2020-08-28 08:48:05 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions