Re: Transactions involving multiple postgres foreign servers, take 2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: Re: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-06-11 12:50:42
Message-ID: CA+TgmoZ7=MufRiz7hmQPN=JF-QYt+H2TooGNcxsYUDa=nucbqg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 10, 2021 at 9:58 PM tsunakawa(dot)takay(at)fujitsu(dot)com
<tsunakawa(dot)takay(at)fujitsu(dot)com> wrote:
> I understand that. As I cited yesterday and possibly before, that's why xa_commit() returns various return codes. So, I have never suggested that FDWs should not report an error and always report success for the commit request. They should be allowed to report an error.

In the text to which I was responding it seemed like you were saying
the opposite. Perhaps I misunderstood.

> The question I have been asking is how. With that said, we should only have two options; one is the return value of the FDW commit routine, and the other is via ereport(ERROR). I suggested the possibility of the former, because if the FDW does ereport(ERROR), Postgres core (transaction manager) may have difficulty in handling the rest of the participants.

I don't think that is going to work. It is very difficult to write
code that doesn't ever ERROR in PostgreSQL. It is not impossible if
the operation is trivial enough, but I think you're greatly
underestimating the complexity of committing the remote transaction.
If somebody had designed PostgreSQL so that every function returns a
return code and every time you call some other function you check that
return code and pass any error up to your own caller, then there would
be no problem here. But in fact the design was that at the first sign
of trouble you throw an ERROR. It's not easy to depart from that
programming model in just one place.

> > Also, leaving aside theoretical arguments, I think it's not
> > realistically possible for an FDW author to write code to commit a
> > prepared transaction that will be safe in the context of running late
> > in PrepareTransaction(), after we've already done
> > RecordTransactionCommit(). Such code can't avoid throwing errors
> > because it can't avoid performing operations and allocating memory.
>
> I'm not completely sure about this. I thought (and said) that the only thing the FDW does would be to send a commit request through an existing connection. So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) during commits (of the second phase of 2PC.)

To send a commit request through an existing connection, you have to
send some bytes over the network using a send() or write() system
call. That can fail. Then you have to read the response back over the
network using recv() or read(). That can also fail. You also need to
parse the result that you get from the remote side, which can also
fail, because you could get back garbage for some reason. And
depending on the details, you might first need to construct the
message you're going to send, which might be able to fail too. Also,
the data might be encrypted using SSL, so you might have to decrypt
it, which can also fail, and you might need to encrypt data before
sending it, which can fail. In fact, if you're using the OpenSSL,
trying to call SSL_read() or SSL_write() can both read and write data
from the socket, even multiple times, so you have extra opportunities
to fail.

> (I took "abort" as the same as "rollback" here.) Once we've sent commit requests to some participants, we can't abort the transaction. If one FDW returned an error halfway, we need to send commit requests to the rest of participants.

I understand that it's not possible to abort the local transaction to
abort after it's been committed, but that doesn't mean that we're
going to be able to send the commit requests to the rest of the
participants. We want to be able to do that, certainly, but there's no
guarantee that it's actually possible. Again, the remote servers may
be dropped into a volcano, or less seriously, we may not be able to
access them. Also, someone may kill off our session.

> It's a design question, as I repeatedly said, whether and how we should report the error of some participants to the client. For instance, how should we report the errors of multiple participants? Concatenate those error messages?

Sure, I agree that there are some questions about how to report errors.

> Anyway, we should design the interface first, giving much thought and respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS). Otherwise, we may end up like "We implemented like this, so the interface is like this and it can only behave like this, although you may find it strange..." That might be a situation similar to what your comment "the design of PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction" suggests -- Postgres doesn't have statement-level rollback.

I think that's a valid concern, but we also have to have a plan that
is realistic. Some things are indeed not possible in PostgreSQL's
design. Also, some of these problems are things everyone has to
somehow confront. There's no database doing 2PC that can't have a
situation where one of the machines disappears unexpectedly due to
some natural disaster or administrator interference. It might be the
case that our inability to do certain things safely during transaction
commit puts us out of compliance with the spec, but it can't be the
case that some other system has no possible failures during
transaction commit. The problem of the network potentially being
disconnected between one packet and the next exists in every system.

> I don't think the resolver-based approach would bring us far enough. It's fundamentally a bottleneck. Such a background process should only handle commits whose requests failed to be sent due to server down.

Why is it fundamentally a bottleneck? It seems to me in some cases it
could scale better than any other approach. If we have to commit on
100 shards in only one process we can only do those commits one at a
time. If we can use resolver processes we could do all 100 at once if
the user can afford to run that many resolvers, which should be way
faster. It is true that if the resolver does not have a connection
open and must open one, that might be slow, but presumably after that
it can keep the connection open and reuse it for subsequent
distributed transactions. I don't really see why that should be
particularly slow.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-06-11 12:52:29 Re: Added schema level support for publication.
Previous Message Michael Paquier 2021-06-11 12:36:57 Re: Fix dropped object handling in pg_event_trigger_ddl_commands