Re: [HACKERS] Transactions involving multiple postgres foreign servers

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date: 2017-12-27 05:38:28
Message-ID: CAD21AoB0M2Zo7aXcJVJQ_MuM6CmrZJGvaGikjhMHMzR7HeSPGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On Wed, Dec 13, 2017 at 10:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> The question I have is how would we deal with a foreign server that is
>>>> not available for longer duration due to crash, longer network outage
>>>> etc. Example is the foreign server crashed/got disconnected after
>>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>>>> blocked for much longer duration without user having an idea of what's
>>>> going on. May be we should add some timeout.
>>>
>>> After more thought, I agree with adding some timeout. I can image
>>> there are users who want the timeout, for example, who cannot accept
>>> even a few seconds latency. If the timeout occurs backend unlocks the
>>> foreign transactions and breaks the loop. The resolver process will
>>> keep to continue to resolve foreign transactions at certain interval.
>>
>> I don't think a timeout is a very good idea. There is no timeout for
>> synchronous replication and the issues here are similar. I will not
>> try to block a patch adding a timeout, but I think it had better be
>> disabled by default and have very clear documentation explaining why
>> it's really dangerous. And this is why: with no timeout, you can
>> count on being able to see the effects of your own previous
>> transactions, unless at some point you sent a query cancel or got
>> disconnected. With a timeout, you may or may not see the effects of
>> your own previous transactions depending on whether or not you hit the
>> timeout, which you have no sure way of knowing.
>>
>>>>> transactions after the coordinator server recovered. On the other
>>>>> hand, for the reading a consistent result on such situation by
>>>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>>>> to the foreign server if a foreign transaction of the foreign server
>>>>> is remained.
>>>>
>>>> +1 for the last sentence. If we do that, we don't need the backend to
>>>> be blocked by resolver since a subsequent read accessing that foreign
>>>> server would get an error and not inconsistent data.
>>>
>>> Yeah, however the disadvantage of this is that we manage foreign
>>> transactions per foreign servers. If a transaction that modified even
>>> one table is remained as a in-doubt transaction, we cannot issue any
>>> SQL that touches that foreign server. Can we occur an error at
>>> ExecInitForeignScan()?
>>
>> I really feel strongly we shouldn't complicate the initial patch with
>> this kind of thing. Let's make it enough for this patch to guarantee
>> that either all parts of the transaction commit eventually or they all
>> abort eventually. Ensuring consistent visibility is a different and
>> hard project, and if we try to do that now, this patch is not going to
>> be done any time soon.
>>
>
> Thank you for the suggestion.
>
> I was really wondering if we should add a timeout to this feature.
> It's a common concern that we want to put a timeout at critical
> section. But currently we don't have such timeout to neither
> synchronous replication or writing WAL. I can image there will be
> users who want to a timeout for such cases but obviously it makes this
> feature more complex. Anyway, even if we add a timeout to this feature
> we can make it as a separated patch and feature. So I'd like to keep
> it simple as first step. This patch guarantees that the transaction
> commit or rollback on all foreign servers or not unless users doesn't
> cancel.
>
> Regards,
>

I've updated documentation of patches, and fixed some bugs. I did some
failure tests of this feature using a fault simulation tool[1] for
PostgreSQL that I created.

0001 patch adds a mechanism to track of writes on local server. This
is required to determine whether we should use 2pc at commit. 0002
patch is the main part. It adds a distributed transaction manager
(currently only for atomic commit), APIs for 2pc and foreign
transaction manager resolver process. 0003 patch makes postgres_fdw
support atomic commit using 2pc.

Please review patches.

[1] https://github.com/MasahikoSawada/pg_simula

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
0001-Keep-track-of-local-writes_v14.patch application/octet-stream 4.0 KB
0002-Support-atomic-commit-involving-multiple-foreign-ser_v14.patch application/octet-stream 161.9 KB
0003-postgres_fdw-supports-atomic-distributed-transaction_v14.patch application/octet-stream 48.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-12-27 06:30:08 Re: Should we nonblocking open FIFO files in COPY?
Previous Message Michael Paquier 2017-12-27 04:10:06 Re: [HACKERS] taking stdbool.h into use