Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
To: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-07-15 11:58:13
Message-ID: 412f81780e15cfb6b3d4905db9000785@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2020-07-15 15:06, Masahiko Sawada wrote:
> On Tue, 14 Jul 2020 at 09:08, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
> wrote:
>>
>> > I've attached the latest version patches. I've incorporated the review
>> > comments I got so far and improved locking strategy.
>>
>> Thanks for updating the patch!
>> I have three questions about the v23 patches.
>>
>>
>> 1. messages related to user canceling
>>
>> In my understanding, there are two messages
>> which can be output when a user cancels the COMMIT command.
>>
>> A. When prepare is failed, the output shows that
>> committed locally but some error is occurred.
>>
>> ```
>> postgres=*# COMMIT;
>> ^CCancel request sent
>> WARNING: canceling wait for resolving foreign transaction due to user
>> request
>> DETAIL: The transaction has already committed locally, but might not
>> have been committed on the foreign server.
>> ERROR: server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> CONTEXT: remote SQL command: PREPARE TRANSACTION
>> 'fx_1020791818_519_16399_10'
>> ```
>>
>> B. When prepare is succeeded,
>> the output show that committed locally.
>>
>> ```
>> postgres=*# COMMIT;
>> ^CCancel request sent
>> WARNING: canceling wait for resolving foreign transaction due to user
>> request
>> DETAIL: The transaction has already committed locally, but might not
>> have been committed on the foreign server.
>> COMMIT
>> ```
>>
>> In case of A, I think that "committed locally" message can confuse
>> user.
>> Because although messages show committed but the transaction is
>> "ABORTED".
>>
>> I think "committed" message means that "ABORT" is committed locally.
>> But is there a possibility of misunderstanding?
>
> No, you're right. I'll fix it in the next version patch.
>
> I think synchronous replication also has the same problem. It says
> "the transaction has already committed" but it's not true when
> executing ROLLBACK PREPARED.

Thanks for replying and sharing the synchronous replication problem.

> BTW how did you test the case (A)? It says canceling wait for foreign
> transaction resolution but the remote SQL command is PREPARE
> TRANSACTION.

I think the timing of failure is important for 2PC test.
Since I don't have any good solution to simulate those flexibly,
I use the GDB debugger.

The message of the case (A) is sent
after performing the following operations.

1. Attach the debugger to a backend process.
2. Set a breakpoint to PreCommit_FdwXact() in CommitTransaction().
// Before PREPARE.
3. Execute "BEGIN" and insert data into two remote foreign tables.
4. Issue a "Commit" command
5. The backend process stops at the breakpoint.
6. Stop a remote foreign server.
7. Detach the debugger.
// The backend continues and prepare is failed. TR try to abort all
remote txs.
// It's unnecessary to resolve remote txs which prepare is failed,
isn't it?
8. Send a cancel request.

BTW, I concerned that how to test the 2PC patches.
There are many failure patterns, such as failure timing,
failure server/nw (and unexpected recovery), and those combinations...

Though it's best to test those failure patterns automatically,
I have no idea for now, so I manually check some patterns.

> I've incorporated the above your comments in the local branch. I'll
> post the latest version patch after incorporating other comments soon.

OK, Thanks.

Regards,

--
Masahiro Ikeda
NTT DATA CORPORATION

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2020-07-15 12:02:09 Re: Partitioning and postgres_fdw optimisations for multi-tenancy
Previous Message Andrew Dunstan 2020-07-15 11:50:03 Re: SQL/JSON: functions