Re: Transactions involving multiple postgres foreign servers

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2015-01-08 14:54:18
Message-ID: CA+TgmoZB+b9jgu1d41OX8XN6jJjirCDL1KvQqRMbWC2KOYhSyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 7, 2015 at 11:20 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> If we are going to include a distributed transaction manager with
> PostgreSQL, it *must* persist enough information about the
> transaction ID and where it is used in a way that will survive a
> subsequent crash before beginning the PREPARE on any of the
> systems. After all nodes are PREPAREd it must flag that persisted
> data to indicate that it is now at a point where ROLLBACK is no
> longer an option. Only then can it start committing the prepared
> transactions. After the last node is committed it can clear this
> information. On start-up the distributed transaction manager must
> check for any distributed transactions left "in progress" and
> commit or rollback based on the preceding; doing retries
> indefinitely until it succeeds or is told to stop.

I think one key question here is whether all of this should be handled
in PostgreSQL core or whether some of it should be handled in other
ways. Is the goal to make postgres_fdw (and FDWs for other databases
that support 2PC) to persist enough information that someone *could*
write a transaction manager for PostgreSQL, or is the goal to actually
write that transaction manager?

Just figuring out how to persist the necessary information is a
non-trivial problem by itself. You might think that you could just
insert a row into a local table saying, hey, I'm about to prepare a
transaction remotely, but of course that doesn't work: if you then go
on to PREPARE before writing and flushing the local commit record,
then a crash before that's done leaves a dangling prepared transaction
on the remote note. You might think to write the record, then after
writing and flush the local commit record do the PREPARE. But you
can't do that either, because now if the PREPARE fails you've already
committed locally.

I guess what you need to do is something like:

1. Write and flush a WAL record indicating an intent to prepare, with
a list of foreign server OIDs and GUIDs.
2. Prepare the remote transaction on each node. If any of those
operations fail, roll back any prepared nodes and error out.
3. Commit locally (i.e. RecordTransactionCommit, writing and flushing WAL).
4. Try to commit the remote transactions.
5. Write a WAL record indicating that you committed the remote transactions OK.

If you fail after step 1, you can straighten things out by looking at
the status of the transaction: if the transaction committed, any
transactions we intended-to-prepare need to be checked. If they are
still prepared, we need to commit them or roll them back according to
what happened to our XID.

(Andres is talking in my other ear suggesting that we ought to reuse
the 2PC infrastructure to do all this. I'm not convinced that's a
good idea, but I'll let him present his own ideas here if he wants to
rather than trying to explain them myself.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2015-01-08 14:59:28 Re: Compression of full-page-writes
Previous Message Heikki Linnakangas 2015-01-08 14:37:37 More Norwegian trouble