|From:||Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>|
|To:||Robert Haas <robertmhaas(at)gmail(dot)com>|
|Cc:||Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>|
|Subject:||Re: [HACKERS] Transactions involving multiple postgres foreign servers|
|Views:||Raw Message | Whole Thread | Download mbox|
On Wed, Dec 13, 2017 at 10:47 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Wed, Dec 13, 2017 at 12:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Dec 11, 2017 at 5:20 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>>> The question I have is how would we deal with a foreign server that is
>>>> not available for longer duration due to crash, longer network outage
>>>> etc. Example is the foreign server crashed/got disconnected after
>>>> PREPARE but before COMMIT/ROLLBACK was issued. The backend will remain
>>>> blocked for much longer duration without user having an idea of what's
>>>> going on. May be we should add some timeout.
>>> After more thought, I agree with adding some timeout. I can image
>>> there are users who want the timeout, for example, who cannot accept
>>> even a few seconds latency. If the timeout occurs backend unlocks the
>>> foreign transactions and breaks the loop. The resolver process will
>>> keep to continue to resolve foreign transactions at certain interval.
>> I don't think a timeout is a very good idea. There is no timeout for
>> synchronous replication and the issues here are similar. I will not
>> try to block a patch adding a timeout, but I think it had better be
>> disabled by default and have very clear documentation explaining why
>> it's really dangerous. And this is why: with no timeout, you can
>> count on being able to see the effects of your own previous
>> transactions, unless at some point you sent a query cancel or got
>> disconnected. With a timeout, you may or may not see the effects of
>> your own previous transactions depending on whether or not you hit the
>> timeout, which you have no sure way of knowing.
>>>>> transactions after the coordinator server recovered. On the other
>>>>> hand, for the reading a consistent result on such situation by
>>>>> subsequent reads, for example, we can disallow backends to inquiry SQL
>>>>> to the foreign server if a foreign transaction of the foreign server
>>>>> is remained.
>>>> +1 for the last sentence. If we do that, we don't need the backend to
>>>> be blocked by resolver since a subsequent read accessing that foreign
>>>> server would get an error and not inconsistent data.
>>> Yeah, however the disadvantage of this is that we manage foreign
>>> transactions per foreign servers. If a transaction that modified even
>>> one table is remained as a in-doubt transaction, we cannot issue any
>>> SQL that touches that foreign server. Can we occur an error at
>> I really feel strongly we shouldn't complicate the initial patch with
>> this kind of thing. Let's make it enough for this patch to guarantee
>> that either all parts of the transaction commit eventually or they all
>> abort eventually. Ensuring consistent visibility is a different and
>> hard project, and if we try to do that now, this patch is not going to
>> be done any time soon.
> Thank you for the suggestion.
> I was really wondering if we should add a timeout to this feature.
> It's a common concern that we want to put a timeout at critical
> section. But currently we don't have such timeout to neither
> synchronous replication or writing WAL. I can image there will be
> users who want to a timeout for such cases but obviously it makes this
> feature more complex. Anyway, even if we add a timeout to this feature
> we can make it as a separated patch and feature. So I'd like to keep
> it simple as first step. This patch guarantees that the transaction
> commit or rollback on all foreign servers or not unless users doesn't
I've updated documentation of patches, and fixed some bugs. I did some
failure tests of this feature using a fault simulation tool for
PostgreSQL that I created.
0001 patch adds a mechanism to track of writes on local server. This
is required to determine whether we should use 2pc at commit. 0002
patch is the main part. It adds a distributed transaction manager
(currently only for atomic commit), APIs for 2pc and foreign
transaction manager resolver process. 0003 patch makes postgres_fdw
support atomic commit using 2pc.
Please review patches.
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
|Next Message||Robert Haas||2017-12-27 06:30:08||Re: Should we nonblocking open FIFO files in COPY?|
|Previous Message||Michael Paquier||2017-12-27 04:10:06||Re: [HACKERS] taking stdbool.h into use|