Re: logical decoding of two-phase transactions

From: Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: logical decoding of two-phase transactions
Date: 2017-02-09 13:23:11
Message-ID: 1FE466EA-7058-484D-B0DB-42CD81FA59F0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On 31 Jan 2017, at 12:22, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> Personally I don't think lack of access to the GID justifies blocking 2PC logical decoding. It can be added separately. But it'd be nice to have especially if it's cheap.

Agreed.

> On 2 Feb 2017, at 00:35, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> Stas was concerned about what happens in logical decoding if we crash between PREPSRE TRANSACTION and COMMIT PREPARED. But we'll always go back and decode the whole txn again anyway so it doesn't matter.

Not exactly. It seems that in previous discussions we were not on the same page, probably due to unclear arguments by me.

From my point of view there is no problems (or at least new problems comparing to ordinary 2PC) with preparing transactions on slave servers with something like “#{xid}#{node_id}” instead of GID if issuing node is coordinator of that transaction. In case of failure, restart, crash we have the same options about deciding what to do with uncommitted transactions.

My concern is about the situation with external coordinator. That scenario is quite important for users of postgres native 2pc, notably J2EE user. Suppose user (or his framework) issuing “prepare transaction ‘mytxname’;" to servers with ordinary synchronous physical replication. If master will crash and replica will be promoted than user can reconnect to it and commit/abort that transaction using his GID. And it is unclear to me how to achieve same behaviour with logical replication of 2pc without GID in commit record. If we will prepare with “#{xid}#{node_id}” on acceptor nodes, then if donor node will crash we’ll lose mapping between user’s gid and our internal gid; contrary we can prepare with user's GID on acceptors, but then we will not know that GID on donor during commit decode (by the time decoding happens all memory state already gone and we can’t exchange our xid to gid).

I performed some tests to understand real impact on size of WAL. I've compared postgres -master with wal_level = logical, after 3M 2PC transactions with patched postgres where GID’s are stored inside commit record too. Testing with 194-bytes and 6-bytes GID’s. (GID max size is 200 bytes)

-master, 6-byte GID after 3M transaction: pg_current_xlog_location = 0/9572CB28
-patched, 6-byte GID after 3M transaction: pg_current_xlog_location = 0/96C442E0

so with 6-byte GID’s difference in WAL size is less than 1%

-master, 194-byte GID after 3M transaction: pg_current_xlog_location = 0/B7501578
-patched, 194-byte GID after 3M transaction: pg_current_xlog_location = 0/D8B43E28

and with 194-byte GID’s difference in WAL size is about 18%

So using big GID’s (as J2EE does) can cause notable WAL bloat, while small GID’s are almost unnoticeable.

May be we can introduce configuration option track_commit_gid by analogy with track_commit_timestamp and make that behaviour optional? Any objections to that?

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2017-02-09 13:56:41 Re: pageinspect: Hash index support
Previous Message Amit Kapila 2017-02-09 11:17:00 Re: pg_basebackup -R