Re: Transactions involving multiple postgres foreign servers, take 2

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: tsunakawa(dot)takay(at)fujitsu(dot)com
Cc: masahiko(dot)sawada(at)2ndquadrant(dot)com, sawada(dot)mshk(at)gmail(dot)com, masao(dot)fujii(at)oss(dot)nttdata(dot)com, amit(dot)kapila16(at)gmail(dot)com, m(dot)usama(at)gmail(dot)com, ikedamsh(at)oss(dot)nttdata(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, sulamul(at)gmail(dot)com, alvherre(at)2ndquadrant(dot)com, thomas(dot)munro(at)gmail(dot)com, ildar(at)adjust(dot)com, horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp, chris(dot)travers(at)adjust(dot)com, robertmhaas(at)gmail(dot)com, ishii(at)sraoss(dot)co(dot)jp
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2020-10-09 05:55:14
Message-ID: 20201009.145514.78253792462097980.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 9 Oct 2020 02:33:37 +0000, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com> wrote in
> From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
> > What about temporary network failures? I think there are users who
> > don't want to give up resolving foreign transactions failed due to a
> > temporary network failure. Or even they might want to wait for
> > transaction completion until they send a cancel request. If we want to
> > call the commit routine only once and therefore want FDW to retry
> > connecting the foreign server within the call, it means we require all
> > FDW implementors to write a retry loop code that is interruptible and
> > ensures not to raise an error, which increases difficulty.
> >
> > Yes, but if we don’t retry to resolve foreign transactions at all on
> > an unreliable network environment, the user might end up requiring
> > every transaction to check the status of foreign transactions of the
> > previous distributed transaction before starts. If we allow to do
> > retry, I guess we ease that somewhat.
>
> OK. As I said, I'm not against trying to cope with temporary network failure. I just don't think it's mandatory. If the network failure is really temporary and thus recovers soon, then the resolver will be able to commit the transaction soon, too.

I should missing something, though...

I don't understand why we hate ERRORs from fdw-2pc-commit routine so
much. I think remote-commits should be performed before local commit
passes the point-of-no-return and the v26-0002 actually places
AtEOXact_FdwXact() before the critical section.

(FWIW, I think remote commits should be performed by backends, not by
another process, because backends should wait for all remote-commits
to end anyway and it is simpler. If we want to multiple remote-commits
in parallel, we could do that by adding some async-waiting interface.)

> Then, we can have a commit retry timeout or retry count like the following WebLogic manual says. (I couldn't quickly find the English manual, so below is in Japanese. I quoted some text that got through machine translation, which appears a bit strange.)
>
> https://docs.oracle.com/cd/E92951_01/wls/WLJTA/trxcon.htm
> --------------------------------------------------
> Abandon timeout
> Specifies the maximum time (in seconds) that the transaction manager attempts to complete the second phase of a two-phase commit transaction.
>
> In the second phase of a two-phase commit transaction, the transaction manager attempts to complete the transaction until all resource managers indicate that the transaction is complete. After the abort transaction timer expires, no attempt is made to resolve the transaction. If the transaction enters a ready state before it is destroyed, the transaction manager rolls back the transaction and releases the held lock on behalf of the destroyed transaction.
> --------------------------------------------------

That's not a retry timeout but a timeout for total time of all
2nd-phase-commits. But I think it would be sufficient. Even if an
fdw could retry 2pc-commit, it's a matter of that fdw and the core has
nothing to do with.

> > Also, what if the user sets the statement timeout to 60 sec and they
> > want to cancel the waits after 5 sec by pressing ctl-C? You mentioned
> > that client libraries of other DBMSs don't have asynchronous execution
> > functionality. If the SQL execution function is not interruptible, the
> > user will end up waiting for 60 sec, which seems not good.

I think fdw-2pc-commit can be interruptible safely as far as we run
the remote commits before entring critical section of local commit.

> FDW functions can be uninterruptible in general, aren't they? We experienced that odbc_fdw didn't allow cancellation of SQL execution.

At least postgres_fdw is interruptible while waiting the remote.

create view lt as select 1 as slp from (select pg_sleep(10)) t;
create foreign table ft(slp int) server sv1 options (table_name 'lt');
select * from ft;
^CCancel request sent
ERROR: canceling statement due to user request

regrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Deng, Gang 2020-10-09 06:09:48 RE: [PoC] Non-volatile WAL buffer
Previous Message Daniel Westermann (DWE) 2020-10-09 05:44:57 Re: Wrong example in the bloom documentation