Re: Transactions involving multiple postgres foreign servers

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2015-01-10 13:11:07
Message-ID: CAB7nPqQrRpTR1RzCeee9LS3vAShcHMX4CGsGyrvF=Ldb4jpZ0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 10, 2015 at 9:02 AM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> On 1/8/15, 12:00 PM, Kevin Grittner wrote:
>> The key point is that the distributed transaction data must be
>> flagged as needing to commit rather than roll back between the
>> prepare phase and the final commit. If you try to avoid the
>> PREPARE, flagging, COMMIT PREPARED sequence by building the
>> flagging of the distributed transaction metadata into the COMMIT
>> process, you still have the problem of what to do on crash
>> recovery. You really need to use 2PC to keep that clean, I think.
Yes, 2PC is needed as long as more than 2 nodes perform write
operations within a transaction.

> If we had an independent transaction coordinator then I agree with you
> Kevin. I think Robert is proposing that if we are controlling one of the
> nodes that's participating as well as coordinating the overall transaction
> that we can take some shortcuts. AIUI a PREPARE means you are completely
> ready to commit. In essence you're just waiting to write and fsync the
> commit message. That is in fact the state that a coordinating PG node would
> be in by the time everyone else has done their prepare. So from that
> standpoint we're OK.
>
> Now, as soon as ANY of the nodes commit, our coordinating node MUST be able
> to commit as well! That would require it to have a real prepared transaction
> of it's own created. However, as long as there is zero chance of any other
> prepared transactions committing before our local transaction, that step
> isn't actually needed. Our local transaction will either commit or abort,
> and that will determine what needs to happen on all other nodes.

It is a property of 2PC to ensure that a prepared transaction will
commit. Now, once it is confirmed on the coordinator that all the
remote nodes have successfully PREPAREd, the coordinator issues COMMIT
PREPARED to each node. What do you do if some nodes report ABORT
PREPARED while other nodes report COMMIT PREPARED? Do you abort the
transaction on coordinator, commit it or FATAL? This lets the cluster
in an inconsistent state, meaning that some consistent cluster-wide
recovery point is needed as well (Postgres-XC and XL have introduced
the concept of barriers for such problems, stuff created first by
Pavan Deolassee).
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2015-01-10 14:16:07 Re: POLA violation with \c service=
Previous Message Michael Paquier 2015-01-10 12:16:46 Re: Fixing memory leak in pg_upgrade