RE: Transactions involving multiple postgres foreign servers, take 2

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: 'Robert Haas' <robertmhaas(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Masahiro Ikeda <ikedamsh(at)oss(dot)nttdata(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "ashutosh(dot)bapat(dot)oss(at)gmail(dot)com" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "m(dot)usama(at)gmail(dot)com" <m(dot)usama(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "sulamul(at)gmail(dot)com" <sulamul(at)gmail(dot)com>, "alvherre(at)2ndquadrant(dot)com" <alvherre(at)2ndquadrant(dot)com>, "thomas(dot)munro(at)gmail(dot)com" <thomas(dot)munro(at)gmail(dot)com>, "ildar(at)adjust(dot)com" <ildar(at)adjust(dot)com>, "horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "chris(dot)travers(at)adjust(dot)com" <chris(dot)travers(at)adjust(dot)com>, "ishii(at)sraoss(dot)co(dot)jp" <ishii(at)sraoss(dot)co(dot)jp>
Subject: RE: Transactions involving multiple postgres foreign servers, take 2
Date: 2021-06-11 01:58:26
Message-ID: TYAPR01MB29902E2186B928C48D3BA21FFE349@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Robert Haas <robertmhaas(at)gmail(dot)com>
> That is completely unrealistic. As Sawada-san has pointed out
> repeatedly, there are tons of things that can go wrong even after the
> remote side has prepared the transaction. Preparing a transaction only
> promises that the remote side will let you commit the transaction upon
> request. It doesn't guarantee that you'll be able to make the request.
> Like Sawada-san says, network problems, out of memory issues, or many
> other things could stop that from happening. Someone could come along
> in another session and run "ROLLBACK PREPARED" on the remote side, and
> now the "COMMIT PREPARED" will never succeed no matter how many times
> you try it. At least, not unless someone goes and creates a new
> prepared transaction with the same 2PC identifier, but then you won't
> be committing the correct transaction anyway. Or someone could take
> the remote server and drop it in a volcano. How do you propose that we
> avoid giving the user an error after the remote server has been
> dropped into a volcano, even though the local node is still alive?

I understand that. As I cited yesterday and possibly before, that's why xa_commit() returns various return codes. So, I have never suggested that FDWs should not report an error and always report success for the commit request. They should be allowed to report an error.

The question I have been asking is how. With that said, we should only have two options; one is the return value of the FDW commit routine, and the other is via ereport(ERROR). I suggested the possibility of the former, because if the FDW does ereport(ERROR), Postgres core (transaction manager) may have difficulty in handling the rest of the participants.

> Also, leaving aside theoretical arguments, I think it's not
> realistically possible for an FDW author to write code to commit a
> prepared transaction that will be safe in the context of running late
> in PrepareTransaction(), after we've already done
> RecordTransactionCommit(). Such code can't avoid throwing errors
> because it can't avoid performing operations and allocating memory.

I'm not completely sure about this. I thought (and said) that the only thing the FDW does would be to send a commit request through an existing connection. So, I think it's not a severe restriction to require FDWs to do ereport(ERROR) during commits (of the second phase of 2PC.)

> It's already been mentioned that, if an ERROR is thrown, it would be
> reported to the user in place of the COMMIT acknowledgement that they
> are expecting. Now, it has also been suggested that we could downgrade
> the ERROR to a WARNING and still report the COMMIT. That doesn't sound
> easy to do, because when the ERROR happens, control is going to jump
> to AbortTransaction(). But even if you could hack it so it works like
> that, it doesn't really solve the problem. What about all of the other
> servers where the prepared transaction also needs to be committed? In
> the design of PostgreSQL, in all circumstances, the way you recover
> from an error is to abort the transaction. That is what brings the
> system back to a clean state. You can't simply ignore the requirement
> to abort the transaction and keep doing more work. It will never be
> reliable, and Tom will instantaneously demand that any code works like
> that be reverted -- and for good reason.

(I took "abort" as the same as "rollback" here.) Once we've sent commit requests to some participants, we can't abort the transaction. If one FDW returned an error halfway, we need to send commit requests to the rest of participants.

It's a design question, as I repeatedly said, whether and how we should report the error of some participants to the client. For instance, how should we report the errors of multiple participants? Concatenate those error messages?

Anyway, we should design the interface first, giving much thought and respecting the ideas of predecessors (TX/XA, MS DTC, JTA/JTS). Otherwise, we may end up like "We implemented like this, so the interface is like this and it can only behave like this, although you may find it strange..." That might be a situation similar to what your comment "the design of PostgreSQL, in all circumstances, the way you recover from an error is to abort the transaction" suggests -- Postgres doesn't have statement-level rollback.

> I am not sure that it's 100% impossible to find a way to solve this
> problem without just having the resolver do all the work, but I think
> it's going to be extremely difficult. We tried to figure out some
> vaguely similar things while working on undo, and it really didn't go
> very well. The later stages of CommitTransaction() and
> AbortTransaction() are places where very few kinds of code are safe to
> execute, and finding a way to patch around that problem is not simple
> either. If the resolver performance is poor, perhaps we could try to
> find a way to improve it. I don't know. But I don't think it does any
> good to say, well, no errors can occur after the remote transaction is
> prepared. That's clearly incorrect.

I don't think the resolver-based approach would bring us far enough. It's fundamentally a bottleneck. Such a background process should only handle commits whose requests failed to be sent due to server down.

My requests are only twofold and haven't changed for long: design the FDW interface that implementors can naturally follow, and design to ensure performance.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-06-11 02:00:13 Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Previous Message Tom Lane 2021-06-11 01:53:18 Re: Race condition in recovery?