Re: [HACKERS] logical decoding of two-phase transactions

From: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
To: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-08-07 13:55:46
Message-ID: CAMGcDxebcmJ3rnSrvQvX9v4=d8=_XgcVqj=f0ixLQv7Hdzji3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Arseny,

> - Decoding transactions at PREPARE record changes rules of the "we ship
> all commits after lsn 'x'" game. Namely, it will break initial
> tablesync: what if consistent snapshot was formed *after* PREPARE, but
> before COMMIT PREPARED, and the plugin decides to employ 2pc? Instead
> of getting inital contents + continious stream of changes the receiver
> will miss the prepared xact contents and raise 'prepared xact doesn't
> exist' error. I think the starting point to address this is to forbid
> two-phase decoding of xacts with lsn of PREPARE less than
> snapbuilder's start_decoding_at.
>

It will be the job of the plugin to return a consistent answer for
every GID that is encountered. In this case, the plugin will decode
the transaction at COMMIT PREPARED time and not at PREPARE time.

> - Currently we will call abort_prepared cb even if we failed to actually
> prepare xact due to concurrent abort. I think it is confusing for
> users. We should either handle this by remembering not to invoke
> abort_prepared in these cases or at least document this behaviour,
> leaving this problem to the receiver side.
>

The point is, when we reach the "ROLLBACK PREPARED", we have no idea
if the "PREPARE" was aborted by this rollback happening concurrently.
So it's possible that the 2PC has been successfully decoded and we
would have to send the rollback to the other side which would need to
check if it needs to rollback locally.

> - I find it suspicious that DecodePrepare completely ignores actions of
> SnapBuildCommitTxn. For example, to execute invalidations, the latter
> sets base snapshot if our xact (or subxacts) did DDL and the snapshot
> not set yet. My fantasy doesn't hint me the concrete example
> where this would burn at the moment, but it should be considered.
>

I had discussed this area with Petr and we didn't see any issues as well, then.

> Now, the bikeshedding.
>
> First patch:
> - I am one of those people upthread who don't think that converting
> flags to bitmask is beneficial -- especially given that many of them
> are mutually exclusive, e.g. xact can't be committed and aborted at
> the same time. Apparently you have left this to the committer though.
>

Hmm, there seems to be divided opinion on this. I am willing to go
back to using the booleans if there's opposition and if the committer
so wishes. Note that this patch will end up adding 4/5 more booleans
in that case (we add new ones for prepare, commit prepare, abort,
rollback prepare etc).

>
> Second patch:
> - Applying gives me
> Applying: Support decoding of two-phase transactions at PREPARE
> .git/rebase-apply/patch:871: trailing whitespace.
>
> + row. The <function>change_cb</function> callback may access system or
> + user catalog tables to aid in the process of outputting the row
> + modification details. In case of decoding a prepared (but yet
> + uncommitted) transaction or decoding of an uncommitted transaction, this
> + change callback is ensured sane access to catalog tables regardless of
> + simultaneous rollback by another backend of this very same transaction.
>
> I don't think we should explain this, at least in such words. As
> mentioned upthread, we should warn about allowed systable_* accesses
> instead. Same for message_cb.
>

Looks like you are looking at an earlier patchset. The latest patchset
has removed the above.

>
> + /*
> + * Tell the reorderbuffer about the surviving subtransactions. We need to
> + * do this because the main transaction itself has not committed since we
> + * are in the prepare phase right now. So we need to be sure the snapshot
> + * is setup correctly for the main transaction in case all changes
> + * happened in subtransanctions
> + */
>
> While we do certainly need to associate subxacts here, the explanation
> looks weird to me. I would leave just the 'Tell the reorderbuffer about
> the surviving subtransactions' as in DecodeCommit.
>
>
> }
> -
> /*
> * There's a speculative insertion remaining, just clean in up, it
> * can't have been successful, otherwise we'd gotten a confirmation
>
> Spurious newline deletion.
>
>
> - I would rename ReorderBufferCommitInternal to ReorderBufferReplay:
> we replay the xact there, not commit.
>
> - If xact is empty, we will not prepare it (and call cb),
> even if the output plugin asked us. However, we will call
> commit_prepared cb.
>
> - ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
> same and should be merged with comments explaining that the answer
> must be stable.
>
> - filter_prepare_cb callback existence is checked in both decode.c and
> in filter_prepare_cb_wrapper.
>
> + /*
> + * The transaction may or may not exist (during restarts for example).
> + * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
> + * it to be created below.
> + */
>
>
> Code around looks sane, but I think that ReorderBufferTXN for our xact
> must *not* exist at this moment: if we are going to COMMIT/ABORT
> PREPARED it, it must have been replayed and RBTXN purged immediately
> after. Also, instead of misty '2PC transactions do not contain any
> reorderbuffers' I would say something like 'create dummy
> ReorderBufferTXN to pass it to the callback'.
>
> - In DecodeAbort:
> + /*
> + * If it's ROLLBACK PREPARED then handle it via callbacks.
> + */
> + if (TransactionIdIsValid(xid) &&
> + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
> +
>
> How xid can be invalid here?
>
>
> - It might be worthwile to put the check
> + !SnapBuildXactNeedsSkip(ctx->snapshot_builder, buf->origptr) &&
> + parsed->dbId == ctx->slot->data.database &&
> + !FilterByOrigin(ctx, origin_id) &&
>
> which appears 3 times now into separate function.
>
>
> + * two-phase transactions - we either have to have all of them or none.
> + * The filter_prepare callback is optional, but can only be defined when
>
> Kind of controversial (all of them or none, but optional), might be
> formulated more accurately.
>
>
> + /*
> + * Capabilities of the output plugin.
> + */
> + bool enable_twophase;
>
> I would rename this to 'supports_twophase' since this is not an option
> but a description of the plugin capabilities.
>
>
> + /* filter_prepare is optional, but requires two-phase decoding */
> + if ((ctx->callbacks.filter_prepare_cb != NULL) && (!ctx->enable_twophase))
> + ereport(ERROR,
> + (errmsg("Output plugin does not support two-phase decoding, but "
> + "registered
> filter_prepared callback.")));
>
> Don't think we need to check that...
>
>
> + * Otherwise call either PREPARE (for twophase transactions) or COMMIT
> + * (for regular ones).
> + */
> + if (rbtxn_rollback(txn))
> + rb->abort(rb, txn, commit_lsn);
>
> This is the dead code since we don't have decoding of in-progress xacts
> yet.
>

Yes, the above check can be done away with it.

>
> Third patch:
> +/*
> + * An xid value pointing to a possibly ongoing or a prepared transaction.
> + * Currently used in logical decoding. It's possible that such transactions
> + * can get aborted while the decoding is ongoing.
> + */
>
> I would explain here that this xid is checked for abort after each
> catalog scan, and sent for the details to SetupHistoricSnapshot.
>
>
> + /*
> + * If CheckXidAlive is valid, then we check if it aborted. If it did, we
> + * error out
> + */
> + if (TransactionIdIsValid(CheckXidAlive) &&
> + !TransactionIdIsInProgress(CheckXidAlive) &&
> + !TransactionIdDidCommit(CheckXidAlive))
> + ereport(ERROR,
> + (errcode(ERRCODE_TRANSACTION_ROLLBACK),
> + errmsg("transaction aborted during system catalog scan")));
>
> Probably centralize checks in one function? As well as 'We don't expect
> direct calls to heap_fetch...' ones.
>
>
> P.S. Looks like you have torn the thread chain: In-Reply-To header of
> mail [1] is missing. Please don't do that.
>

That wasn't me. I was also annoyed and surprised to see a new email
thread separate from the earlier containing 100 or so messages.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-08-07 13:59:09 Re: [PATCH] Include application_name in "connection authorized" log message
Previous Message Stephen Frost 2018-08-07 13:46:08 Re: [PATCH] Include application_name in "connection authorized" log message