Re: [HACKERS] logical decoding of two-phase transactions

From: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2017-12-04 15:15:30
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

PFA, latest patch for this functionality.
This patch contains the following changes as compared to the earlier patch:

- Fixed a bunch of typos and comments

- Modified HeapTupleSatisfiesVacuum to return HEAPTUPLE_RECENTLY_DEAD
if the transaction id is newer than OldestXmin. Doing this only for
CATALOG tables (htup->t_tableOid < (Oid) FirstNormalObjectId).

- Added a filter callback filter_decode_txn_cb_wrapper() to decide if
it's ok to decode the NEXT change record. This filter as of now checks
if the XID that is involved got aborted. Additional checks can be
added here as needed.

- Added ABORT callback in the decoding process. This was not needed
before because we always used to decode committed transactions. With
2PC transactions, it possible that while we are decoding it, another
backend might issue a concurrent ROLLBACK PREPARED. So when
filter_decode_txn_cb_wrapper() gets called, it will tell us to not to
decode the next change record. In that case we need to send an ABORT
to the subscriber (and not ROLLBACK PREPARED because we are yet to
issue PREPARE to the subscriber)

- Added all functionality to read the abort command and apply it on
the remote subscriber as needed.

- Added functionality in ReorderBufferCommit() to abort midways based
on the feedback from filter_decode_txn_cb_wrapper()

- Modified LockGXact() and FinishPreparedTransaction() to allow
missing GID in case of "ROLLBACK PREPARED". Currently, this will only
happen in the logical apply code path. We still send it to the
subscriber because it's difficult to identify on the provider if this
transaction was aborted midways in decoding or if it's in PREPARED
state on the subscriber. It will error out as before in all other

- Totally removed snapshot addition/deletion code while doing the
decoding. That's not needed at all while decoding an ongoing
transaction. The entries in the snapshot are needed for future
transactions to be able to decode older transactions. For 2PC
transactions, we don't need to decode them till COMMIT PREPARED gets
called. This has simplified all that unwanted snapshot push/pop code,
which is nice.


On 30 November 2017 at 16:08, Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> wrote:
> Hi,
>>> So perhaps better approach would be to not return
>>> HEAPTUPLE_DEAD if the transaction id is newer than the OldestXmin (same
>>> logic we use for deleted tuples of committed transactions) in the
>>> HeapTupleSatisfiesVacuum() even for aborted transactions. I also briefly
>>> checked HOT pruning and AFAICS the normal HOT pruning (the one not
>>> called by vacuum) also uses the xmin as authoritative even for aborted
>>> txes so nothing needs to be done there probably.
>>> In case we are worried that this affects cleanups of for example large
>>> aborted COPY transactions and we think it's worth worrying about then we
>>> could limit the new OldestXmin based logic only to catalog tuples as
>>> those are the only ones we need available in decoding.
>> Yeah, if it's limited to catalog tuples only then that sounds good. I was
>> quite concerned about how it'd impact vacuuming otherwise, but if limited to
>> catalogs about the only impact should be on workloads that create lots of
>> TEMPORARY tables then ROLLBACK - and not much on those.
> Based on these discussions, I think there are two separate issues here:
> 1) Make HeapTupleSatisfiesVacuum() to behave differently for recently
> aborted catalog tuples.
> 2) Invent a mechanism to stop a specific logical decoding activity in
> the middle. The reason to stop it could be a concurrent abort, maybe a
> global transaction manager decides to rollback, or any other reason,
> for example.
> ISTM, that for 2, if (1) is able to leave the recently abort tuples
> around for a little bit while (we only really need them till the
> decode of the current change record is ongoing), then we could
> accomplish it via a callback. This callback should be called before
> commencing decode and network send of each change record. In case of
> in-core logical decoding, the callback for pgoutput could check for
> the transaction having aborted (a call to TransactionIdDidAbort() or
> similar such functions), additional logic can be added as needed for
> various scenarios. If it's aborted, we will abandon decoding and send
> an ABORT to the subscribers before returning.
> Regards,
> Nikhils

Nikhil Sontakke
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
2pc_logical_04_12_17.patch application/octet-stream 89.3 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2017-12-04 15:31:20 Re: [HACKERS] pgbench more operators & functions
Previous Message Masahiko Sawada 2017-12-04 15:11:27 Re: Re: User defined data types in Logical Replication