Re: Transactions involving multiple postgres foreign servers, take 2

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Muhammad Usama <m(dot)usama(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Ildar Musin <ildar(at)adjust(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Chris Travers <chris(dot)travers(at)adjust(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-07-16 04:16:50
Message-ID: CA+fd4k6cvbiyhLVS4DNQ=yDHs4TD0pj+Zr4MpttddvEFFmr5ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 14 Jul 2020 at 17:24, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com> wrote:
>
> > I've attached the latest version patches. I've incorporated the review
> > comments I got so far and improved locking strategy.
>
> I want to ask a question about streaming replication with 2PC.
> Are you going to support 2PC with streaming replication?
>
> I tried streaming replication using v23 patches.
> I confirm that 2PC works with streaming replication,
> which there are primary/standby coordinator.
>
> But, in my understanding, the WAL of "PREPARE" and
> "COMMIT/ABORT PREPARED" can't be replicated to the standby server in
> sync.
>
> If this is right, the unresolved transaction can be occurred.
>
> For example,
>
> 1. PREPARE is done
> 2. crash primary before the WAL related to PREPARE is
> replicated to the standby server
> 3. promote standby server // but can't execute "ABORT PREPARED"
>
> In above case, the remote server has the unresolved transaction.
> Can we solve this problem to support in-sync replication?
>
> But, I think some users use async replication for performance.
> Do we need to document the limitation or make another solution?
>

IIUC with synchronous replication, we can guarantee that WAL records
are written on both primary and replicas when the client got an
acknowledgment of commit. We don't replicate each WAL records
generated during transaction one by one in sync. In the case you
described, the client will get an error due to the server crash.
Therefore I think the user cannot expect WAL records generated so far
has been replicated. The same issue could happen also when the user
executes PREPARE TRANSACTION and the server crashes. To prevent this
issue, I think we would need to send each WAL records in sync but I'm
not sure it's reasonable behavior, and as long as we write WAL in the
local and then send it to replicas we would need a smart mechanism to
prevent this situation.

Related to the pointing out by Ikeda-san, I realized that with the
current patch the backend waits for synchronous replication and then
waits for foreign transaction resolution. But it should be reversed.
Otherwise, it could lead to data loss even when the client got an
acknowledgment of commit. Also, when the user is using both atomic
commit and synchronous replication and wants to cancel waiting, he/she
will need to press ctl-c twice with the current patch, which also
should be fixed.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhenghua Lyu 2020-07-16 04:22:36 Re: Volatile Functions in Parallel Plans
Previous Message Amit Kapila 2020-07-16 04:07:35 Re: Volatile Functions in Parallel Plans