Re: eXtensible Transaction Manager API

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: eXtensible Transaction Manager API
Date: 2015-11-15 20:06:05
Message-ID: 5648E5AD.6030401@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I am sorry, looks like I had incorrectly interpret Michael's comment:
> Not always. If COMMIT PREPARED fails at some of the nodes but succeeds
> on others, the transaction is already partially acknowledged as
> committed in the cluster. Hence it makes more sense for the
> coordinator to commit transactions on the remaining nodes. Before
> issuing any COMMIT PREPARED queries, I guess that's fine to rollback
> the transactions on all nodes though.
I am completely agree that at second stage pf 2PC, once we make a decision to commit transaction, there is no way back: we have to complete commit at all nodes. If node reports that it has successfully prepared transaction and ready to commit, then it
mean that node should be able to recover it in case of failure. In my example of code in GO illustrating work with 2PC I for simplicity completely exclude login of handling errors.
Actually it should be something like this (now in C++ using pqxx):

nontransaction srcTxn(srcCon);
nontransaction dstTxn(dstCon);
string transComplete;
try {
exec(srcTxn, "prepare transaction '%s'", gtid);
exec(dstTxn, "prepare transaction '%s'", gtid);
// At this moment all nodes has prepared transaction, so in principle we can make a decision to commit here...
// But it is easier to wait until CSN is assigned to transaction and it is delivered to all nodes
exec(srcTxn, "select dtm_begin_prepare('%s')", gtid);
exec(dstTxn, "select dtm_begin_prepare('%s')", gtid);
csn = execQuery(srcTxn, "select dtm_prepare('%s', 0)", gtid);
csn = execQuery(dstTxn, "select dtm_prepare('%s', %lu)", gtid, csn);
exec(srcTxn, "select dtm_end_prepare('%s', %lu)", gtid, csn);
exec(dstTxn, "select dtm_end_prepare('%s' %lu)", gtid, csn);
// If we reach this point, we make decision to commit...
transComplete = string("commit prepared '") + gtid + "'";
} catch (pqxx_exception const& x) {
// ... if not, then rollback prepared transaction
transComplete = string("rollback prepared '") + gtid + "'";
}
pipeline srcPipe(srcTxn);
pipeline dstPipe(dstTxn);
srcPipe.insert(completeTrans);
dstPipe.insert(compteteTrans);
srcPipe.complete();
dstPipe.complete();

On 11/15/2015 08:59 PM, Kevin Grittner wrote:
> On Saturday, November 14, 2015 8:41 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> On 13 November 2015 at 21:35, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
>>> On Tue, Nov 10, 2015 at 3:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> If the database is corrupted, there's no way to guarantee that
>>>> anything works as planned. This is like saying that criticizing
>>>> somebody's disaster recovery plan on the basis that it will be
>>>> inadequate if the entire planet earth is destroyed.
> +1
>
> Once all participating servers return "success" from the "prepare"
> phase, the only sane way to proceed is to commit as much as
> possible. (I guess you still have some room to argue about what to
> do after an error on the attempt to commit the first prepared
> transaction, but even there you need to move forward if it might
> have been due to a communications failure preventing a success
> indication for a commit which was actually successful.) The idea
> of two phase commit is that failure to commit after a successful
> prepare should be extremely rare.
>
>>> As well as there could be FS, OS, network problems... To come back to
>>> the point, my point is simply that I found surprising the sentence of
>>> Konstantin upthread saying that if commit fails on some of the nodes
>>> we should rollback the prepared transaction on all nodes. In the
>>> example given, in the phase after calling dtm_end_prepare, say we
>>> perform COMMIT PREPARED correctly on node 1, but then failed it on
>>> node 2 because a meteor has hit a server, it seems that we cannot
>>> rollback, instead we had better rolling in a backup and be sure that
>>> the transaction gets committed. How would you rollback the transaction
>>> already committed on node 1? But perhaps I missed something...
> Right.
>
>> The usual way this works in an XA-like model is:
>>
>> In phase 1 (prepare transaction, in Pg's spelling), failure on
>> any node triggers a rollback on all nodes.
>>
>> In phase 2 (commit prepared), failure on any node causes retries
>> until it succeeds, or until the admin intervenes - say, to remove
>> that node from operation. The global xact as a whole isn't
>> considered successful until it's committed on all nodes.
>>
>> 2PC and distributed commit is well studied, including the
>> problems. We don't have to think this up for ourselves. We don't
>> have to invent anything here. There's a lot of distributed
>> systems theory to work with - especially when dealing with well
>> studied relational DBs trying to maintain ACID semantics.
> Right; it is silly not to build on decades of work on theory and
> practice in this area.
>
> What is making me nervous as I watch this thread is a bit of loose
> talk about the I in ACID. There have been some points where it
> seemed to be implied that we had sufficient transaction isolation
> if we could get all the participating nodes using the same
> snapshot. I think I've seen some hint that there is an intention
> to use distributed strict two-phase locking with a global lock
> manager to achieve serializable behavior. The former is vulnerable
> to some well-known serialization anomalies and the latter was
> dropped from PostgreSQL when MVCC was implemented because of its
> horrible concurrency and performance characteristics, which is not
> going to be less of an issue in a distributed system using that
> technique. It may be possible to implement the Serializable
> Snapshot Isolation (SSI) technique across multiple cooperating
> nodes, but it's not obvious exactly how that would be implemented,
> and I have seen no discussion of that.
>
> If we're going to talk about maintaining ACID semantics in this
> environment, I think we need to explicitly state what level of
> isolation we intend to provide, and how we intend to do that.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-11-15 20:20:03 Re: psql: add \pset true/false
Previous Message Kevin Grittner 2015-11-15 17:59:22 Re: eXtensible Transaction Manager API