RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-06-14 02:04:31
Message-ID: TYAPR01MB2990E3E8A7B42974E7DCBB36FE319@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Robert Haas <robertmhaas(at)gmail(dot)com>
> On Thu, Jun 10, 2021 at 9:58 PM tsunakawa(dot)takay(at)fujitsu(dot)com
> <tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> > The question I have been asking is how. With that said, we should only have
> two options; one is the return value of the FDW commit routine, and the other is
> via ereport(ERROR). I suggested the possibility of the former, because if the
> FDW does ereport(ERROR), Postgres core (transaction manager) may have
> difficulty in handling the rest of the participants.
>
> I don't think that is going to work. It is very difficult to write
> code that doesn't ever ERROR in PostgreSQL. It is not impossible if
> the operation is trivial enough, but I think you're greatly
> underestimating the complexity of committing the remote transaction.
> If somebody had designed PostgreSQL so that every function returns a
> return code and every time you call some other function you check that
> return code and pass any error up to your own caller, then there would
> be no problem here. But in fact the design was that at the first sign
> of trouble you throw an ERROR. It's not easy to depart from that
> programming model in just one place.

> > I'm not completely sure about this. I thought (and said) that the only thing
> the FDW does would be to send a commit request through an existing
> connection. So, I think it's not a severe restriction to require FDWs to do
> ereport(ERROR) during commits (of the second phase of 2PC.)
>
> To send a commit request through an existing connection, you have to
> send some bytes over the network using a send() or write() system
> call. That can fail. Then you have to read the response back over the
> network using recv() or read(). That can also fail. You also need to
> parse the result that you get from the remote side, which can also
> fail, because you could get back garbage for some reason. And
> depending on the details, you might first need to construct the
> message you're going to send, which might be able to fail too. Also,
> the data might be encrypted using SSL, so you might have to decrypt
> it, which can also fail, and you might need to encrypt data before
> sending it, which can fail. In fact, if you're using the OpenSSL,
> trying to call SSL_read() or SSL_write() can both read and write data
> from the socket, even multiple times, so you have extra opportunities
> to fail.

I know sending a commit request may get an error from various underlying functions, but we're talking about the client side, not the Postgres's server side that could unexpectedly ereport(ERROR) somewhere. So, the new FDW commit routine won't lose control and can return an error code as its return value. For instance, the FDW commit routine for DBMS-X would typically be:

int
DBMSXCommit(...)
{
int ret;

/* extract info from the argument to pass to xa_commit() */

ret = DBMSX_xa_commit(...);
/* This is the actual commit function which is exposed to the app server (e.g. Tuxedo) through the xa_commit() interface */

/* map xa_commit() return values to the corresponding return values of the FDW commit routine */
switch (ret)
{
case XA_RMERR:
ret = ...;
break;
...
}

return ret;
}

> I think that's a valid concern, but we also have to have a plan that
> is realistic. Some things are indeed not possible in PostgreSQL's
> design. Also, some of these problems are things everyone has to
> somehow confront. There's no database doing 2PC that can't have a
> situation where one of the machines disappears unexpectedly due to
> some natural disaster or administrator interference. It might be the
> case that our inability to do certain things safely during transaction
> commit puts us out of compliance with the spec, but it can't be the
> case that some other system has no possible failures during
> transaction commit. The problem of the network potentially being
> disconnected between one packet and the next exists in every system.

So, we need to design how commit behaves from the user's perspective. That's the functional design. We should figure out what's the desirable response of commit first, and then see if we can implement it or have to compromise in some way. I think we can reference the X/Open TX standard and/or JTS (Java Transaction Service) specification (I haven't had a chance to read them yet, though.) Just in case we can't find the requested commit behavior in the volcano case from those specifications, ... (I'm hesitant to say this because it may be hard,) it's desirable to follow representative products such as Tuxedo and GlassFish (the reference implementation of Java EE specs.)

> > I don't think the resolver-based approach would bring us far enough. It's
> fundamentally a bottleneck. Such a background process should only handle
> commits whose requests failed to be sent due to server down.
>
> Why is it fundamentally a bottleneck? It seems to me in some cases it
> could scale better than any other approach. If we have to commit on
> 100 shards in only one process we can only do those commits one at a
> time. If we can use resolver processes we could do all 100 at once if
> the user can afford to run that many resolvers, which should be way
> faster. It is true that if the resolver does not have a connection
> open and must open one, that might be slow, but presumably after that
> it can keep the connection open and reuse it for subsequent
> distributed transactions. I don't really see why that should be
> particularly slow.

Concurrent transactions are serialized at the resolver. I heard that the current patch handles 2PC like this: the TM (transaction manager in Postgres core) requests prepare to the resolver, the resolver sends prepare to the remote server and wait for reply, the TM gets back control from the resolver, TM requests commit to the resolver, the resolver sends commit to the remote server and wait for reply, and TM gets back control. The resolver handles one transaction at a time.

In regard to the case where one session has to commit on multiple remote servers, we're talking about the asynchronous interface just like what the XA standard provides.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-06-14 02:20:37 Re: Avoid stuck of pbgench due to skipped transactions
Previous Message Noah Misch 2021-06-14 01:46:15 Re: Continuing instability in insert-conflict-specconflict test