Re: eXtensible Transaction Manager API

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: eXtensible Transaction Manager API
Date: 2015-11-07 16:53:32
Message-ID: 563E2C8C.5000204@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,
Thank you for your feedback.
My comments are inside.

On 11/07/2015 05:11 PM, Amit Kapila wrote:
>
> Today, while studying your proposal and related material, I noticed
> that in both the approaches DTM and tsDTM, you are talking about
> committing a transaction and acquiring the snapshot consistently, but
> not touched upon the how the locks will be managed across nodes and
> how deadlock detection across nodes will work. This will also be one
> of the crucial points in selecting one of the approaches.

Lock manager is one of the tasks we are currently working on.
There are still a lot of open questions:
1. Should distributed lock manager (DLM) do something else except detection of distributed deadlock?
2. Should DLM be part of XTM API or it should be separate API?
3. Should DLM be implemented by separate process or should it be part of arbiter (dtmd).
4. How to globally identify resource owners (0transactions) in global lock graph. In case of DTM we have global (shared) XIDs,
and in tsDTM - global transactions IDs, assigned by application (which is not so clear how to retrieve).
In other cases we may need to have local->global transaction id mapping, so looks like DLM should be part of DTM...

> Also I have
> noticed that discussion about Rollback is not there, example how will
> Rollback happen with API's provided in your second approach (tsDTM)?

In tsDTM approach two phase commit is performed by coordinator and currently is using standard PostgreSQL two phase commit:

Code in GO performing two phase commit:

exec(conn1, "prepare transaction '" + gtid + "'")
exec(conn2, "prepare transaction '" + gtid + "'")
exec(conn1, "select dtm_begin_prepare($1)", gtid)
exec(conn2, "select dtm_begin_prepare($1)", gtid)
csn = _execQuery(conn1, "select dtm_prepare($1, 0)", gtid)
csn = _execQuery(conn2, "select dtm_prepare($1, $2)", gtid, csn)
exec(conn1, "select dtm_end_prepare($1, $2)", gtid, csn)
exec(conn2, "select dtm_end_prepare($1, $2)", gtid, csn)
exec(conn1, "commit prepared '" + gtid + "'")
exec(conn2, "commit prepared '" + gtid + "'")

If commit at some of the nodes failed, coordinator should rollback prepared transaction at all nodes.

> Similarly, having some discussion on parts of recovery that could be affected
> would be great.

We are currently implementing fault tolerance and recovery for DTM approach (with centralized arbiter).
There are several replicas of arbiter, synchronized using RAFT protocol.
But with tsDTM approach recovery model is still obscure...
We are thinking about it.
>
> I think in this patch, it is important to see the completeness of all the
> API's that needs to be exposed for the implementation of distributed
> transactions and the same is difficult to visualize without having complete
> picture of all the components that has some interaction with the distributed
> transaction system. On the other hand we can do it in incremental fashion
> as and when more parts of the design are clear.

That is exactly what we are going to do - we are trying to integrate DTM with existed systems (pg_shard, postgres_fdw, BDR) and find out what is missed and should be added. In parallel we are trying to compare efficiency and scalability of different solutions.
For example we still considering scalability problems with tsDTM approach: to provide acceptable performance, it requires very precise clock synchronization (we have to use PTP instead of NTP). So it may be waste of time trying to provide fault tolerance
for tsDTM if we finally found out that this approach can not provide better scalability than simpler DTM approach.

>
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2015-11-07 17:07:55 Re: Getting sorted data from foreign server for merge join
Previous Message Bruce Momjian 2015-11-07 16:52:17 Re: Summary of Vienna sharding summit, new TODO item