Re: [HACKERS] logical decoding of two-phase transactions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-07-26 20:02:41
Message-ID: 20180726200241.aje4dv4jsv25v4k2@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

On 2018-07-26 20:24:00 +0530, Nikhil Sontakke wrote:
> Hi,
>
> >> I think we even can just do something like a global
> >> TransactionId check_if_transaction_is_alive = InvalidTransactionId;
> >> and just set it up during decoding. And then just check it whenever it's
> >> not set tot InvalidTransactionId.
> >>
> >>
> >
> > Ok. I will work on something along these lines and re-submit the set of patches.

> PFA, latest patchset, which completely removes the earlier
> LogicalLock/LogicalUnLock implementation using groupDecode stuff and
> uses the newly suggested approach of checking the currently decoded
> XID for abort in systable_* API functions. Much simpler to code and
> easier to test as well.

So, leaving the fact that it might not actually be correct aside ;), you
seem to be ok with the approach?

> Out of the patchset, the specific patch which focuses on the above
> systable_* API based XID checking implementation is part of
> 0003-Gracefully-handle-concurrent-aborts-of-uncommitted-t.patch. So,
> it might help to take a look at this patch first for any additional
> feedback on this approach.

K.

> There's an additional test case in
> 0005-Additional-test-case-to-demonstrate-decoding-rollbac.patch which
> uses a sleep in the "change" plugin API to allow a concurrent rollback
> on the 2PC being currently decoded. Andres generally doesn't like this
> approach :-), but there are no timing/interlocking issues now, and the
> sleep just helps us do a concurrent rollback, so it might be ok now,
> all things considered. Anyways, it's an additional patch for now.

Yea, I still don't think it's ok. The tests won't be reliable. There's
ways to make this reliable, e.g. by forcing a lock to be acquired that's
externally held or such. Might even be doable just with a weird custom
datatype.

> From 75edeb440794fff7de48082dafdecb065940bee5 Mon Sep 17 00:00:00 2001
> From: Nikhil Sontakke <nikhils(at)2ndQuadrant(dot)com>
> Date: Thu, 26 Jul 2018 18:45:26 +0530
> Subject: [PATCH 3/5] Gracefully handle concurrent aborts of uncommitted
> transactions that are being decoded alongside.
>
> When a transaction aborts, it's changes are considered unnecessary for
> other transactions. That means the changes may be either cleaned up by
> vacuum or removed from HOT chains (thus made inaccessible through
> indexes), and there may be other such consequences.
>
> When decoding committed transactions this is not an issue, and we
> never decode transactions that abort before the decoding starts.
>
> But for in-progress transactions - for example when decoding prepared
> transactions on PREPARE (and not COMMIT PREPARED as before), this
> may cause failures when the output plugin consults catalogs (both
> system and user-defined).
>
> We handle such failures by returning ERRCODE_TRANSACTION_ROLLBACK
> sqlerrcode from system table scan APIs to the backend decoding a
> specific uncommitted transaction. The decoding logic on the receipt
> of such an sqlerrcode aborts the ongoing decoding and returns
> gracefully.
> ---
> src/backend/access/index/genam.c | 31 +++++++++++++++++++++++++
> src/backend/replication/logical/reorderbuffer.c | 30 ++++++++++++++++++++----
> src/backend/utils/time/snapmgr.c | 25 ++++++++++++++++++--
> src/include/utils/snapmgr.h | 4 +++-
> 4 files changed, 82 insertions(+), 8 deletions(-)
>
> diff --git a/src/backend/access/index/genam.c b/src/backend/access/index/genam.c
> index 9d08775687..67c5810bf7 100644
> --- a/src/backend/access/index/genam.c
> +++ b/src/backend/access/index/genam.c
> @@ -423,6 +423,16 @@ systable_getnext(SysScanDesc sysscan)
> else
> htup = heap_getnext(sysscan->scan, ForwardScanDirection);
>
> + /*
> + * If CheckXidAlive is valid, then we check if it aborted. If it did, we
> + * error out
> + */
> + if (TransactionIdIsValid(CheckXidAlive) &&
> + TransactionIdDidAbort(CheckXidAlive))
> + ereport(ERROR,
> + (errcode(ERRCODE_TRANSACTION_ROLLBACK),
> + errmsg("transaction aborted during system catalog scan")));
> +
> return htup;
> }

Don't we have to check TransactionIdIsInProgress() first? C.f. header
comments in tqual.c. Note this is also not guaranteed to be correct
after a crash (where no clog entry will exist for an aborted xact), but
we probably shouldn't get here in that case - but better be safe.

I suspect it'd be better reformulated as
TransactionIdIsValid(CheckXidAlive) &&
!TransactionIdIsInProgress(CheckXidAlive) &&
!TransactionIdDidCommit(CheckXidAlive)

What do you think?

I think it'd also be good to add assertions to codepaths not going
through systable_* asserting that
!TransactionIdIsValid(CheckXidAlive). Alternatively we could add an
if (unlikely(TransactionIdIsValid(CheckXidAlive)) && ...)
branch to those too.

> From 80fc576bda483798919653991bef6dc198625d90 Mon Sep 17 00:00:00 2001
> From: Nikhil Sontakke <nikhils(at)2ndQuadrant(dot)com>
> Date: Wed, 13 Jun 2018 16:31:15 +0530
> Subject: [PATCH 4/5] Teach test_decoding plugin to work with 2PC
>
> Includes a new option "enable_twophase". Depending on this options
> value, PREPARE TRANSACTION will either be decoded or treated as
> a single phase commit later.

FWIW, I don't think I'm ok with doing this on a per-plugin-option basis.
I think this is something that should be known to the outside of the
plugin. More similar to how binary / non-binary support works. Should
also be able to inquire the output plugin whether it's supported (cf
previous similarity).

> From 682b0de2827d1f55c4e471c3129eb687ae0825a5 Mon Sep 17 00:00:00 2001
> From: Nikhil Sontakke <nikhils(at)2ndQuadrant(dot)com>
> Date: Wed, 13 Jun 2018 16:32:16 +0530
> Subject: [PATCH 5/5] Additional test case to demonstrate decoding/rollback
> interlocking
>
> Introduce a decode-delay parameter in the test_decoding plugin. Based
> on the value provided in the plugin, sleep for those many seconds while
> inside the "decode change" plugin call. A concurrent rollback is fired
> off which aborts that transaction in the meanwhile. A subsequent
> systable access will error out causing the logical decoding to abort.

Yea, I'm *definitely* still not on board with this. This'll just lead to
a fragile or extremely slow test.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Cramer 2018-07-26 20:40:18 why doesn't pg_create_logical_replication_slot throw an error if the encoder doesn't exist
Previous Message Alexander Korotkov 2018-07-26 19:59:22 Re: Locking B-tree leafs immediately in exclusive mode