Re: Transactions involving multiple postgres foreign servers

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Vinayak Pokale <vinpokale(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2017-01-12 11:23:36
Message-ID: CAD21AoA=FLZ6viCxwDkxJ+yOH4bp_TxrneFDtrvz+ipu25SQtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 23, 2016 at 1:49 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Fri, Dec 9, 2016 at 4:02 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>> On Fri, Dec 9, 2016 at 3:02 PM, vinayak <Pokale_Vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>> On 2016/12/05 14:42, Ashutosh Bapat wrote:
>>>>
>>>> On Mon, Dec 5, 2016 at 11:04 AM, Haribabu Kommi
>>>> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>>>>
>>>>
>>>> On Fri, Nov 11, 2016 at 5:38 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
>>>> wrote:
>>>>>>
>>>>>>
>>>>>> 2PC is a basic building block to support the atomic commit and there
>>>>>> are some optimizations way in order to reduce disadvantage of 2PC. As
>>>>>> you mentioned, it's hard to support a single model that would suit
>>>>>> several type of FDWs. But even if it's not a purpose for sharding,
>>>>>> because many other database which could be connected to PostgreSQL via
>>>>>> FDW supports 2PC, 2PC for FDW would be useful for not only sharding
>>>>>> purpose. That's why I was focusing on implementing 2PC for FDW so far.
>>>>>
>>>>>
>>>>> Moved to next CF with "needs review" status.
>>>>
>>>> I think this should be changed to "returned with feedback.". The
>>>> design and approach itself needs to be discussed. I think, we should
>>>> let authors decide whether they want it to be added to the next
>>>> commitfest or not.
>>>>
>>>> When I first started with this work, Tom had suggested me to try to
>>>> make PREPARE and COMMIT/ROLLBACK PREPARED involving foreign servers or
>>>> at least postgres_fdw servers work. I think, most of my work that
>>>> Vinayak and Sawada have rebased to the latest master will be required
>>>> for getting what Tom suggested done. We wouldn't need a lot of changes
>>>> to that design. PREPARE involving foreign servers errors out right
>>>> now. If we start supporting prepared transactions involving foreign
>>>> servers that will be a good improvement over the current status-quo.
>>>> Once we get that done, we can continue working on the larger problem
>>>> of supporting ACID transactions involving foreign servers.
>>>
>>> In the pgconf ASIA depelopers meeting Bruce Momjian and other developers
>>> discussed
>>> on FDW based sharding [1]. The suggestions from other hackers was that we
>>> need to discuss
>>> the big picture and use cases of sharding. Bruce has listed all the building
>>> blocks of built-in sharding
>>> on wiki [2]. IIUC,transaction manager involving foreign servers is one part
>>> of sharding.
>>
>> Yeah, the 2PC on FDW is a basic building block for FDW based sharding
>> and it would be useful not only FDW sharding but also other purposes.
>> As far as I surveyed some papers the many kinds of distributed
>> transaction management architectures use the 2PC for atomic commit
>> with some optimisations. And using 2PC to provide atomic commit on
>> distributed transaction has much affinity with current PostgreSQL
>> implementation from some perspective.
>>
>>> As per the Bruce's wiki page there are two use cases for transactions
>>> involved multiple foreign servers:
>>> 1. Cross-node read-only queries on read/write shards:
>>> This will require a global snapshot manager to make sure the shards
>>> return consistent data.
>>> 2. Cross-node read-write queries:
>>> This will require a global snapshot manager and global transaction
>>> manager.
>>>
>>> I agree with you that if we start supporting PREPARE and COMMIT/ROLLBACK
>>> PREPARED
>>> involving foreign servers that will be good improvement.
>>>
>>> [1] https://wiki.postgresql.org/wiki/PgConf.Asia_2016_Developer_Meeting
>>> [2] https://wiki.postgresql.org/wiki/Built-in_Sharding
>>>
>>
>> I also agree to work on implementing the atomic commit across the
>> foreign servers and then continue to work on the more larger problem.
>> I think that this will be large step forward. I'm going to submit the
>> updated version patch to CF3.
>
> Attached latest version patches. Almost design is the same as previous
> patches and I incorporated some optimisations and updated
> documentation. But the documentation and regression test is not still
> enough.
>
> 000 patch adds some new FDW APIs to achive the atomic commit involving
> the foreign servers using two-phase-commit. If more than one foreign
> servers involve with the transaction or the transaction changes local
> data and involves even one foreign server, local node executes PREPARE
> and COMMIT/ROLLBACK PREPARED on foreign servers at commit. A lot of
> part of this implementation is inspired by two phase commit code. So I
> incorporated recent changes of two phase commit code, for example
> recovery speed improvement, into this patch.
> 001 patch makes postgres_fdw support atomic commit. If
> two_phase_commit is set 'on' to a foreign server, the two-phase-commit
> will be used at commit. 002 patch adds the pg_fdw_resolver new contrib
> module that is a bgworker process that resolves the in-doubt
> transaction on foreign server if there is.
>
> The reply might be late next week but feedback and review comment are
> very welcome.
>

Long time passed since original patch proposed by Ashutosh, so I
explain again about current design and functionality of this feature.
If you have any question, please feel free to ask.

Parameters
==========
The patch introduces max_foreign_prepared_transactions parameter and
two_phase_commit parameter.

two_phase_commit parameter is a new foreign server parameter, which
means that specified foreign server is capable of two phase commit
protocol. The modification transaction could be committed using two
phase commit protocol on foreign server with two_phase_commit = on. We
can set this parameter by CREATE/ALTER SERVER command.

max_foreign_prepared_transactions is a new GUC parameter, which
controls the upper bound of the number of transaction on foreign
servers the local transaction prepares. Note that it does not control
the number of transactions on local server that involves foreign
server. Since one transaction could prepare transaction on multiple
foreign servers, max_foreign_prepared_transactions should be set at
least ((max_connections) * (the number of foreign server with
two_phase_commit = on)). Changing this parameter requires restart.

Cluster-wide atomic commit
=======================
Since the distributed transaction commit on foreign servers are
executed independently, the transaction that modified data on the
multiple foreign servers is not ensured that transaction did either
all of them commit or all of them rollback. The patch adds the
functionality that guarantees distributed transaction did either
commit or rollback on all foreign servers. IOW the goal of this patch
is achieving the cluster-wide atomic commit across foreign server that
is capable two phase commit protocol. If the transaction modifies data
on multiple foreign servers and does COMMIT then the transaction is
committed or rollback-ed on foreign servers using two phase commit
protocol implicitly.

Transaction is committed or rollback-ed using two phase commit
protocol in following cases.
* The transaction changes local data.
* The transaction changes data on more than one foreign server whose
two_phase_commit is on.

In order to manage foreign transaction, the patch changes PostgreSQL
core so that it keeps track of foreign transaction. These entry is
exists on shared buffer but it's written to fdw_xact file in
$PGDATA/fdw_xact directory by checkpoint. We can check all foreign
transaction entries via pg_fdw_xacts system view.

The commit of distributed transaction using two phase commit protocol
is executed as follows;

In 1st phase, every foreign server with two_phase_commit = on needs to
register the connection to MyFDWConnection while starting new
transaction on a foreign connection using RegisterXactForeignServer().
During pre-commit phase following steps are executed.

1. Get transaction identifier used for PREPARE TRANSACTION on foreign servers.
2. Execute COMMIT on foreign server with two_phase_commit = off.
3. Register fdw_xact entry into shared memory and write
XLOG_FDW_XACT_INSERT WAL.
4. Execute PREPARE TRANSACTION on foreign server with two_phase_commit = on.

After that, local changes is committed (calls
RecordTransactionCommit()). Meantime of phase 1 and local commit, the
transaction could be failed due to serialization failure and
pre-commit of notify. In such case, all foreign transactions are
rollback-ed.

In 2nd phase, foreign transaction on foreign server with
two_phase_commit = off are already finished in 1st phase, so we focus
on only the foreign server with two_phase_commit = on. During commit
phase following steps are executed.

1. Resolve foreign prepared transaction.
2. Remove foreign transaction entry and write XLOG_FDW_XACT_REMOVE WAL.

In case server crashes after step 1 and before step 2, a resolved
foreign transaction will be considered unresolved when the local
server recovers or standby takes over the master. It will try to
resolve the prepared transaction again and should get an error from
foreign server.

Crash recovery
=============
During crash recovery, the fdw_xact entry are inserted to
KnownFDWXactList or removed from KnownFDWXact list when corresponding
WAL records are replayed. After the redo is done fdw_xact file is
re-created and then pg_fdw_xact directory is scanned for unresolved
foreign prepared transactions.

The files in this directory are named as triplet (xid, foreign server
oid, user oid) to create a unique name for each file. This scan also
emits the oldest transaction id with an unresolved prepared foreign
transactions. This affects oldest active transaction id, since the
status of this transaction id is required to decide the fate of
unresolved prepared foreign transaction. On standby during WAL replay
files are just inserted or removed. If the standby is required to
finish recovery and take over the master, pg_fdw_xact is scanned to
read unresolved foreign prepared transactions into the shared memory.

Many of fdw_xact.c code is inspired by two_phase.c code. So recovery
mechanism and process are almost same as two_phase. The patch
incorporated recent optimization of two_phase.c.

Handling in-doubt transaction
========================
Any crash or connection failure in phase 2 leaves the prepared
transaction in unresolved state (called the in-doubt transaction). We
need to resolve the in-doubt transaction after foreign server
recovered. We can do that manually by calling pg_fdw_xact_resolve
function on local server but the patch introduces new contrib module
pg_fdw_resolver in order to handle them automatically. pg_fdw_resolver
is a background worker process, which periodically checks if there is
in-doubt transaction and tries to resolve such transaction.

FDW APIs
======
The patch introduces new four FDW APIs; GetPrepreId,
EndForeignTransaction, PrepareForeignTransaction and
ResolvePrepareForeginTransaction

* GetPreparedId is called to get transaction identifier on pre-commit phase.
* EndForeignTransaction is called on commit phase and executes either
COMMIT or ROLLBACK on foreign server.
* PrepareForeignTransaciton is called on pre-commit phase and executes
PREPARE TRANSACTION on foreign server.
* ResolvePrepareForeginTransaction is called on commit phase and
execute either COMMIT PREPARED or ROLLBACK PREPARED with given
transaction identifier on foreign server.

If the foreign server is not capable of two phase commit, last two
APIs are not required.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-01-12 11:30:44 Re: Retiring from the Core Team
Previous Message Rafia Sabih 2017-01-12 11:02:40 Re: WIP: [[Parallel] Shared] Hash