Re: Transactions involving multiple postgres foreign servers

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Vinayak Pokale <pokale_vinayak_q3(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Subject: Re: Transactions involving multiple postgres foreign servers
Date: 2017-08-03 10:08:21
Message-ID: CAB7nPqTv9H-qjB=BUjjmGR9pkRLrpj_vKZmosuQzsKFGpLxauQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 31, 2017 at 7:27 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Robert Haas wrote:
>
>> An alternative approach is to have some kind of other identifier,
>> let's call it a distributed transaction ID (DXID) which is mapped by
>> each node onto a local XID.
>
> Postgres-XL seems to manage this problem by using a transaction manager
> node, which is in charge of assigning snapshots. I don't know how that
> works, but perhaps adding that concept here could be useful too. One
> critical point to that design is that the app connects not directly to
> the underlying Postgres server but instead to some other node which is
> or connects to the node that manages the snapshots.
>
> Maybe Michael can explain in better detail how it works, and/or how (and
> if) it could be applied here.

XL (and XC) use a transaction ID that plugs in directly with the
internal XID assigned by Postgres, actually bypassing what Postgres
assigns to each backend if a transaction needs one. So if transactions
are not heavenly shared among multiple nodes, performance gets
impacted. Now when we worked on this project we noticed that we gained
in performance by reducing the number of requests and grouping them
together, so a proxy layer has been added between the global
transaction manager and Postgres to group those requests. This does
not change the fact that read-committed transactions still need
snapshots for each query, which is consuming. So this approach hurts
less with analytic queries, and more with OLTP.

2PC transaction status was tracked as well in the GTM. This allows
fancy things like being able to prepare a transaction on node 1, and
commit it on node 2 for example. I am not honestly sure that you need
to add anything at clog level for example, but I think that having at
the FDW level the meta data of a transaction stored as a rather
correct approach on the matter. That's what greenplum actually does if
I recall correctly (Heikki save me!): it has one coordinator with such
metadata handling, and bunch of underlying nodes that store the data.
Citus does also that if I recall correctly. So instead of
decentralizing this information, this gets stored in a Postgres
coordinator instance.
--
Michael

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2017-08-03 10:09:16 Re: reload-through-the-top-parent switch the partition table
Previous Message Daniel Gustafsson 2017-08-03 10:02:59 Support for Secure Transport SSL library on macOS as OpenSSL alternative