Re: Question concerning XTM (eXtensible Transaction Manager API)

From: konstantin knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Question concerning XTM (eXtensible Transaction Manager API)
Date: 2015-11-17 06:42:54
Message-ID: 52EA4242-F07E-46EA-B999-D207021C9675@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for your response.

On Nov 16, 2015, at 11:21 PM, Kevin Grittner wrote:
> I'm not entirely clear on what you're saying here. I admit I've
> not kept in close touch with the distributed processing discussions
> lately -- is there a write-up and/or diagram to give an overview of
> where we're at with this effort?

https://wiki.postgresql.org/wiki/DTM

>
> If you are saying that DTM tries to roll back a transaction after
> any participating server has entered the RecordTransactionCommit()
> critical section, then IMO it is broken. Full stop. That can't
> work with any reasonable semantics as far as I can see.

DTM is not trying to rollback committed transaction.
What he tries to do is to hide this commit.
As I already wrote, the idea was to implement "lightweight" 2PC because prepared transactions mechanism in PostgreSQL adds too much overhead and cause soe problems with recovery.

The transaction is normally committed in xlog, so that it can always be recovered in case of node fault.
But before setting correspondent bit(s) in CLOG and releasing locks we first contact arbiter to get global status of transaction.
If it is successfully locally committed by all nodes, then arbiter approves commit and commit of transaction normally completed.
Otherwise arbiter rejects commit. In this case DTM marks transaction as aborted in CLOG and returns error to the client.
XLOG is not changed and in case of failure PostgreSQL will try to replay this transaction.
But during recovery it also tries to restore transaction status in CLOG.
And at this placeDTM contacts arbiter to know status of transaction.
If it is marked as aborted in arbiter's CLOG, then it wiull be also marked as aborted in local CLOG.
And according to PostgreSQL visibility rules no other transaction will see changes made by this transaction.

>
>> We can not just call elog(ERROR,...) in SetTransactionStatus
>> implementation because inside critical section it cause Postgres
>> crash with panic message. So we have to remember that transaction is
>> rejected and report error later after exit from critical section:
>
> I don't believe that is a good plan. You should not enter the
> critical section for recording that a commit is complete until all
> the work for the commit is done except for telling the all the
> servers that all servers are ready.

It is good point.
May be it is the reason of performance scalability problems we have noticed with DTM.

>> In our benchmarks we have found that simple credit-debit banking
>> test (without any DTM) works almost 10 times slower with PostgreSQL
>> 2PC than without it. This is why we try to propose alternative
>> solution (right now pg_dtm is 2 times slower than vanilla
>> PostgreSQL, but it not only performs 2PC but also provide consistent
>> snapshots).
>
> Are you talking about 10x the latency on a commit, or that the
> overall throughput under saturation load is one tenth of running
> without something to guarantee the transactional integrity of the
> whole set of nodes? The former would not be too surprising, while
> the latter would be rather amazing.

Sorry, some clarification.
We get 10x slowdown of performance caused by 2pc on very heavy load on the IBM system with 256 cores.
At "normal" servers slowdown of 2pc is smaller - about 2x.

>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2015-11-17 06:50:36 Re: Speed up Clog Access by increasing CLOG buffers
Previous Message Kyotaro HORIGUCHI 2015-11-17 06:35:43 Re: Making tab-complete.c easier to maintain