Re: [HACKERS] logical decoding of two-phase transactions

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2019-01-14 22:16:33
Message-ID: 87a7k2996m.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com> writes:

> I'd like to believe that the latest patch set tries to address some
> (if not all) of your concerns. Can you please take a look and let me
> know?

Hi, sure.

General things:

- Earlier I said that there is no point of sending COMMIT PREPARED if
decoding snapshot became consistent after PREPARE, i.e. PREPARE hadn't
been sent. I realized since then that such use cases actually exist:
prepare might be copied to the replica by e.g. basebackup or something
else earlier. Still, a plugin must be able to easily distinguish these
too early PREPARES without doing its own bookkeeping (remembering each
PREPARE it has seen). Fortunately, turns out this we can make it
easy. If during COMMIT PREPARED / ABORT PREPARED record decoding we
see that ReorderBufferTXN with such xid exists, it means that either
1) plugin refused to do replay of this xact at PREPARE or 2) PREPARE
was too early in the stream. Otherwise xact would be replayed at
PREPARE processing and rbtxn purged immediately after. I think we
should add this to the documentation of filter_prepare_cb. Also, to
this end we need to add an argument to this callback specifying at
which context it was called: during prepare / commit prepared / abort
prepared. Also, for this to work, ReorderBufferProcessXid must be
always called at PREPARE, not only when 2PC decoding is disabled.

- BTW, ReorderBufferProcessXid at PREPARE should be always called
anyway, because otherwise if xact is empty, we will not prepare it
(and call cb), even if the output plugin asked us not to filter it
out. However, we will call commit_prepared cb, which is inconsistent.

- I find it weird that in DecodePrepare and in DecodeCommit you always
ask the plugin whether to filter an xact, given that sometimes you
know beforehand that you are not going to replay it: it might have
already been replayed, might have wrong dbid, origin, etc. One
consequence of this: imagine that notorious xact with PREPARE before
point where snapshot became consistent and COMMIT PREPARED after that
point. Even if filter_cb says 'I want 2PC on this xact', with current
code it won't be replayed on PREPARE and rbxid will be destroyed with
ReorderBufferForget. Now this xact is lost.

- Doing full-blown SnapBuildCommitTxn during PREPARE decoding is wrong,
because xact effects must not yet be seen to others. I discussed this
at length and described adjacent problems in [1].

- I still don't like that if 2PC xact was aborted and its replay
stopped, prepare callback won't be called but abort_prepared would be.
This either should be documented or fixed.

Second patch:

+ /* filter_prepare is optional, but requires two-phase decoding */
+ if ((ctx->callbacks.filter_prepare_cb != NULL) && (!opt->enable_twophase))
+ ereport(ERROR,
+ (errmsg("Output plugin does not support two-phase decoding, but "
+ "registered filter_prepared callback.")));

I actually think that enable_twophase output plugin option is
redundant. If plugin author wants 2PC, he just provides
filter_prepare_cb callback and potentially others. I also don't see much
value in checking that exactly 0 or 3 callbacks were registred.

- You allow (commit|abort)_prepared_cb, prepare_cb callbacks to be not
specified with enabled 2PC and call them without check that they
actually exist.

- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using <command>PREPARE TRANSACTION</command> will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a <command>ROLLBACK PREPARED</command>
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.

This should say explicitly that such 2PC xact will be decoded at PREPARE
record. Probably also add that otherwise it is decoded at CP
record. Probably also add "and abort_cb callback called" to the last
sentence.

+ The required <function>abort_cb</function> callback is called whenever
+ a transaction abort has to be initiated. This can happen if we are

This callback is not required in the code, and it would be indeed a bad
idea to demand it, breaking compatibility with existing plugins not
caring about 2PC.

+ * Otherwise call either PREPARE (for twophase transactions) or COMMIT
+ * (for regular ones).
+ */
+ if (rbtxn_rollback(txn))
+ rb->abort(rb, txn, commit_lsn);

This is dead code since we don't have decoding of in-progress xacts yet.

+ /*
+ * If there is a valid top-level transaction that's different from the
+ * two-phase one we are aborting, clear its reorder buffer as well.
+ */
+ if (TransactionIdIsNormal(xid) && xid != parsed->twophase_xid)
+ ReorderBufferAbort(ctx->reorder, xid, origin_lsn);

What is the aim of this? How xl_xid xid of commit prepared record can be
normal?

+ /*
+ * The transaction may or may not exist (during restarts for example).
+ * Anyways, 2PC transactions do not contain any reorderbuffers. So allow
+ * it to be created below.
+ */

Code around looks sane, but I think that restarts are irrelevant to
rbtxn existence at this moment: if we are going to COMMIT/ABORT PREPARED
it, it must have been replayed and rbtxn purged immediately after. The
only reason why rbtxn can exist here is invalidation addition
(ReorderBufferAddInvalidations) happening a couple of calls earlier.
Also, instead of misty '2PC transactions do not contain any
reorderbuffers' I would say something like 'create dummy
ReorderBufferTXN to pass it to the callback'.

- filter_prepare_cb callback existence is checked in both decode.c and
in filter_prepare_cb_wrapper.

Third patch:

+/*
+ * An xid value pointing to a possibly ongoing or a prepared transaction.
+ * Currently used in logical decoding. It's possible that such transactions
+ * can get aborted while the decoding is ongoing.
+ */

I would explain here that this xid is checked for abort after each
catalog scan, and sent for the details to SetupHistoricSnapshot.

Nitpicking:

First patch: I still don't think that these flags need a bitmask.

Second patch:

- I still think ReorderBufferCommitInternal name is confusing and should
be renamed to something like ReorderBufferReplay.

/* Do we know this is a subxact? Xid of top-level txn if so */
TransactionId toplevel_xid;
+ /* In case of 2PC we need to pass GID to output plugin */
+ char *gid;

Better add here newline as between other fields.

+ txn->txn_flags |= RBTXN_PREPARE;
+ txn->gid = palloc(strlen(gid) + 1); /* trailing '\0' */
+ strcpy(txn->gid, gid);

pstrdup?

- ReorderBufferTxnIsPrepared and ReorderBufferPrepareNeedSkip do the
same and should be merged with comments explaining that the answer
must be stable.

+ The optional <function>commit_prepared_cb</function> callback is called whenever
+ a commit prepared transaction has been decoded. The <parameter>gid</parameter> field,

a commit prepared transaction *record* has been decoded?

Fourth patch:

Applying: Teach test_decoding plugin to work with 2PC
.git/rebase-apply/patch:347: trailing whitespace.
-- test savepoints
.git/rebase-apply/patch:424: trailing whitespace.
# get XID of the above two-phase transaction
warning: 2 lines add whitespace errors.

[1] https://www.postgresql.org/message-id/87zhxrwgvh.fsf%40ars-thinkpad

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-14 22:17:16 Re: Libpq support to connect to standby server as priority
Previous Message Alvaro Herrera 2019-01-14 22:13:07 Re: explain plans with information about (modified) gucs