Re: Transactions involving multiple postgres foreign servers

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Vinayak Pokale <pokale_vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2017-09-26 09:06:51
Message-ID: CAD21AoBsERCqwO65K2cUB6JV_keB3W=bVTwuuZqp3Ubr7pc_Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 1, 2017 at 1:40 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jul 27, 2017 at 8:25 AM, Ashutosh Bapat
> <ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>> The remote transaction can be committed/aborted only after the fate of
>> the local transaction is decided. If we commit remote transaction and
>> abort local transaction, that's not good. AtEOXact* functions are
>> called immediately after that decision in post-commit/abort phase. So,
>> if we want to commit/abort the remote transaction immediately it has
>> to be done in post-commit/abort processing. Instead if we delegate
>> that to the remote transaction resolved backend (introduced by the
>> patches) the delay between local commit and remote commits depends
>> upon when the resolve gets a chance to run and process those
>> transactions. One could argue that that delay would anyway exist when
>> post-commit/abort processing fails to resolve remote transaction. But
>> given the real high availability these days, in most of the cases
>> remote transaction will be resolved in the post-commit/abort phase. I
>> think we should optimize for most common case. Your concern is still
>> valid, that we shouldn't raise an error or do anything critical in
>> post-commit/abort phase. So we should device a way to send
>> COMMIT/ABORT prepared messages to the remote server in asynchronous
>> fashion carefully avoiding errors. Recent changes to 2PC have improved
>> performance in that area to a great extent. Relying on resolver
>> backend to resolve remote transactions would erode that performance
>> gain.
>
> I think there are two separate but interconnected issues here. One is
> that if we give the user a new command prompt without resolving the
> remote transaction, then they might run a new query that sees their
> own work as committed, which would be bad. Or, they might commit,
> wait for the acknowledgement, and then tell some other session to go
> look at the data, and find it not there. That would also be bad. I
> think the solution is likely to do something like what we did for
> synchronous replication in commit
> 9a56dc3389b9470031e9ef8e45c95a680982e01a -- wait for the remove
> transaction to be resolved (by the background process) but allow an
> interrupt to escape the wait-loop.
>
> The second issue is that having the resolver resolve transactions
> might be slower than doing it in the foreground. I don't necessarily
> see a reason why that should be a big problem. I mean, the resolver
> might need to establish a separate connection, but if it keeps that
> connection open for a while (say, 5 minutes) in case further
> transactions arrive then it won't be an issue except on really
> low-volume system which isn't really a case I think we need to worry
> about very much. Also, the hand-off to the resolver might take some
> time, but that's equally true for sync rep and we're living with it
> there. Anything else is presumably just the resolver itself being
> inefficient which seems like something that can simply be fixed.
>
> FWIW, I don't think the present resolver implementation is likely to
> be what we want. IIRC, it's just calling an SQL function which
> doesn't seem like a good approach. Ideally we should stick an entry
> into a shared memory queue and then ping the resolver via SetLatch,
> and it can directly invoke an FDW method on the data from the shared
> memory queue. It should be possible to set things up so that a user
> who wishes to do so can run multiple copies of the resolver thread at
> the same time, which would be a good way to keep latency down if the
> system is very busy with distributed transactions.
>

Based on the review comment from Robert, I'm planning to do the big
change to the architecture of this patch so that a backend process
work together with a dedicated background worker that is responsible
for resolving the foreign transactions. For the usage of this feature,
it will be almost the same as what this patch has been doing except
for adding a new GUC paramter that controls the number of resovler
process launch. That is, we can have multiple resolver process to keep
latency down.

On technical view, the processing of the transaction involving
multiple foreign server will be changed as follows.

* Backend processes
1. In PreCommit phase, prepare the transaction on foreign servers and
save fdw_xact entries into the array on shmem. Also create a
fdw_xact_state entry on shmem hash that has the index of each fdw_xact
entry.
2. Local commit/abort.
3. Change its process state to FDWXACT_WAITING and enqueue the MyProc
to the shmem queue.
4. Ping to the resolver process via SetLatch.
5. Wait to be waken up.

* Resovler processes
1. Fetch PGPROC entry from the shmem queue and get its XID (say, XID-a).
2. Get the fdw_xact_state entry from shmem hash by XID-a.
3. Iterate fdw_xact entries using the index, and resolve the foreign
transactions.
3-a. If even one foreign transaction failed to resolve, raise an error.
4. Change the waiting backend state to FDWXACT_COMPLETED and release it.

Also, the resolver process scans over the array of fdw_xact entry
periodically, and tries to resolve in-doubt transactions.
This patch still has the concern in the design and I'm planing to
update the patch for the next commit fest. So I'll mark this as
"Waiting on Author".

Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2017-09-26 09:07:27 Re: make async slave to wait for lsn to be replayed
Previous Message Magnus Hagander 2017-09-26 09:03:36 Re: Built-in plugin for logical decoding output