Re: [HACKERS] Transactions involving multiple postgres foreign servers

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Transactions involving multiple postgres foreign servers
Date: 2018-05-11 00:56:42
Message-ID: CAD21AoCUyLrfqx--F7aAHF48ax3M3CY2so1QtWxH9jL47eJT8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for the comment.

On Fri, May 11, 2018 at 3:57 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Feb 27, 2018 at 2:21 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> I might be missing your point. As for API breaking, this patch doesn't
>> break any existing FDWs. All new APIs I proposed are dedicated to 2PC.
>> In other words, FDWs that work today can continue working after this
>> patch gets committed, but if FDWs want to support atomic commit then
>> they should be updated to use new APIs. The reason why the calling of
>> FdwXactRegisterForeignServer is necessary is that the core code
>> controls the foreign servers that involved with the transaction but
>> the whether each foreign server uses 2PC option (two_phase_commit) is
>> known only on FDW code. We can eliminate the necessity of calling
>> FdwXactRegisterForeignServer by moving 2PC option from fdw-level to
>> server-level in order not to enforce calling the registering function
>> on FDWs. If we did that, the user can use the 2PC option as a
>> server-level option.
>
> Well, FdwXactRegisterForeignServer has a "bool two_phase_commit"
> argument. If it only needs to be called by FDWs that support 2PC,
> then that argument is unnecessary. If it needs to be called by all
> FDWs, then it breaks existing FDWs that don't call it.
>

I understood now. For now since FdwXactRegisterForeignServer needs to
be called by only FDWs that support 2PC, I will remove the argument.

>>>> But this is now responsible by FDW. I should change it to resolver
>>>> side. That is, FDW can raise error in ordinarily way but core code
>>>> should catch and process it.
>>>
>>> I don't understand exactly what you mean here.
>>
>> Hmm I think I got confused. My understanding is that logging a
>> complaint in commit case and not doing that in abort case if prepared
>> transaction doesn't exist is a core code's job. An API contract here
>> is that FDW raise an error with ERRCODE_UNDEFINED_OBJECT error code if
>> there is no such transaction. Since it's an expected case in abort
>> case for the fdwxact manager, the core code can regard the error as
>> not actual problem.
>
> In general, it's not safe to catch an error and continue unless you
> protect the code that throws the error by a sub-transaction. That
> means we shouldn't expect the FDW to throw an error when the prepared
> transaction isn't found and then just have the core code ignore the
> error. Instead the FDW should return normally and, if the core code
> needs to know whether the prepared transaction was found, then the FDW
> should indicate this through a return value, not an ERROR.
>
>> Or do you mean that FDWs should not raise an error if there is the
>> prepared transaction, and then core code doesn't need to check
>> sqlstate in case of error?
>
> Right. As noted above, that's unsafe, so we shouldn't do it.

Thank you. I will think the API contract again based on your suggestion.

>
>>>> You're right. Perhaps we can deal with it by PrescanFdwXacts until
>>>> reach consistent point, and then have vac_update_datfrozenxid check
>>>> local xids of un-resolved fdwxact to determine the new datfrozenxid.
>>>> Since the local xids of un-resolved fdwxacts would not be relevant
>>>> with vacuuming, we don't need to include it to snapshot and
>>>> GetOldestXmin etc. Also we hint to resolve fdwxact when near
>>>> wraparound.
>>>
>>> I agree with you about snapshots, but I'm not sure about GetOldestXmin.
>>
>> Hmm, although I've thought concern in case where we don't consider
>> local xids of un-resolved fdwxact in GetOldestXmin, I could not find
>> problem. Could you share your concern if you have? I'll try to find a
>> possibility based on it.
>
> I don't remember exactly what I was thinking when I wrote that, but I
> think the point is that GetOldestXmin() does a bunch of things other
> than control the threshold for VACUUM, and we'd need to study them all
> and look for problems. For example, it won't do for an XID to get
> reused while it's still associated with an unresolved fdwxact. We
> therefore certainly need such XIDs to hold back the cluster-wide
> threshold for clog truncation in some manner -- and maybe that
> involves GetOldestXmin(). Or maybe not. But anyway the point,
> broadly considered, is that GetOldestXmin() is used in various ways,
> and I don't know if we've thought through all of the consequences in
> regard to this new feature.

Okay, I'll have GetOldestXmin() consider the oldest local xid of
un-resolved fdwxact as well in the next version patch for more safety,
while considering more efficient ways.

>
> Can I ask what your time frame is for updating this patch?
> Considering how much work appears to remain, if you want to get this
> committed to v12, it would be best to get started as early as
> possible.
>

I'll post an updated patch by PGCon at the latest, hopefully in the next week.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2018-05-11 01:05:44 Re: Global snapshots
Previous Message Tom Lane 2018-05-10 21:42:48 Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)