Re: [HACKERS] logical decoding of two-phase transactions

From: Andres Freund <andres(at)anarazel(dot)de>
To: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-02-12 16:20:40
Message-ID: 20180212162040.rsnlnysmf7n4yo7i@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-02-12 13:36:16 +0530, Nikhil Sontakke wrote:
> Hi Andres,
>
> > First off: This patch has way too many different types of changes as
> > part of one huge commit. This needs to be split into several
> > pieces. First the cleanups (e.g. the fields -> flag changes), then the
> > individual infrastructure pieces (like the twophase.c changes, best
> > split into several pieces as well, the locking stuff), then the main
> > feature, then support for it in the output plugin. Each should have an
> > individual explanation about why the change is necessary and not a bad
> > idea.
> >
>
> Ok, I will break this patch into multiple logical pieces and re-submit.

Thanks.

> >
> > On 2018-02-06 17:50:40 +0530, Nikhil Sontakke wrote:
> >> @@ -46,6 +48,9 @@ typedef struct
> >> bool skip_empty_xacts;
> >> bool xact_wrote_changes;
> >> bool only_local;
> >> + bool twophase_decoding;
> >> + bool twophase_decode_with_catalog_changes;
> >> + int decode_delay; /* seconds to sleep after every change record */
> >
> > This seems too big a crock to add just for testing. It'll also make the
> > testing timing dependent...
> >
>
> The idea *was* to make testing timing dependent. We wanted to simulate
> the case when a rollback is issued by another backend while the
> decoding is still ongoing. This allows that test case to be tested.

What I mean is that this will be hell on the buildfarm because the
different animals are differently fast.

> >> } TestDecodingData;
> >
> >> void
> >> _PG_init(void)
> >> @@ -85,9 +106,15 @@ _PG_output_plugin_init(OutputPluginCallbacks *cb)
> >> cb->begin_cb = pg_decode_begin_txn;
> >> cb->change_cb = pg_decode_change;
> >> cb->commit_cb = pg_decode_commit_txn;
> >> + cb->abort_cb = pg_decode_abort_txn;
> >
> >> cb->filter_by_origin_cb = pg_decode_filter;
> >> cb->shutdown_cb = pg_decode_shutdown;
> >> cb->message_cb = pg_decode_message;
> >> + cb->filter_prepare_cb = pg_filter_prepare;
> >> + cb->filter_decode_txn_cb = pg_filter_decode_txn;
> >> + cb->prepare_cb = pg_decode_prepare_txn;
> >> + cb->commit_prepared_cb = pg_decode_commit_prepared_txn;
> >> + cb->abort_prepared_cb = pg_decode_abort_prepared_txn;
> >> }
> >
> > Why does this introduce both abort_cb and abort_prepared_cb? That seems
> > to conflate two separate features.
> >
>
> Consider the case when we have a bunch of change records to apply for
> a transaction. We sent a "BEGIN" and then start decoding each change
> record one by one. Now a rollback was encountered while we were
> decoding.

This will be quite the mess once streaming of changes is introduced.

> >> +/* Filter out unnecessary two-phase transactions */
> >> +static bool
> >> +pg_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> >> + TransactionId xid, const char *gid)
> >> +{
> >> + TestDecodingData *data = ctx->output_plugin_private;
> >> +
> >> + /* treat all transactions as one-phase */
> >> + if (!data->twophase_decoding)
> >> + return true;
> >> +
> >> + if (txn && txn_has_catalog_changes(txn) &&
> >> + !data->twophase_decode_with_catalog_changes)
> >> + return true;
> >
> > What? I'm INCREDIBLY doubtful this is a sane thing to expose to output
> > plugins. As in, unless I hear a very very convincing reason I'm strongly
> > opposed.
> >
>
> These bools are specific to the test_decoding plugin.

txn_has_catalog_changes() definitely isn't just exposed to
test_decoding. I think you're making the output plugin interface
massively more complicated in this patch and I think we need to push
back on that.

> Again, these are useful in testing decoding in various scenarios with
> twophase decoding enabled/disabled. Testing decoding when catalog
> changes are allowed/disallowed etc. Please take a look at
> "contrib/test_decoding/sql/prepared.sql" for the various scenarios.

I don't se ehow that addresses my concern in any sort of way.

> >> +/*
> >> + * Check if we should continue to decode this transaction.
> >> + *
> >> + * If it has aborted in the meanwhile, then there's no sense
> >> + * in decoding and sending the rest of the changes, we might
> >> + * as well ask the subscribers to abort immediately.
> >> + *
> >> + * This should be called if we are streaming a transaction
> >> + * before it's committed or if we are decoding a 2PC
> >> + * transaction. Otherwise we always decode committed
> >> + * transactions
> >> + *
> >> + * Additional checks can be added here, as needed
> >> + */
> >> +static bool
> >> +pg_filter_decode_txn(LogicalDecodingContext *ctx,
> >> + ReorderBufferTXN *txn)
> >> +{
> >> + /*
> >> + * Due to caching, repeated TransactionIdDidAbort calls
> >> + * shouldn't be that expensive
> >> + */
> >> + if (txn != NULL &&
> >> + TransactionIdIsValid(txn->xid) &&
> >> + TransactionIdDidAbort(txn->xid))
> >> + return true;
> >> +
> >> + /* if txn is NULL, filter it out */
> >
> > Why can this be NULL?
> >
>
> Depending on parameters passed to the ReorderBufferTXNByXid()
> function, the txn might be NULL in some cases, especially during
> restarts.

That a) isn't an explanation why that's ok b) reasoning why this ever
needs to be exposed to the output plugin.

> >> static bool
> >> pg_decode_filter(LogicalDecodingContext *ctx,
> >> RepOriginId origin_id)
> >> @@ -409,8 +622,18 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
> >> }
> >> data->xact_wrote_changes = true;
> >>
> >> + if (!LogicalLockTransaction(txn))
> >> + return;
> >
> > It really really can't be right that this is exposed to output plugins.
> >
>
> This was discussed in the other thread
> (http://www.postgresql-archive.org/Logical-Decoding-and-HeapTupleSatisfiesVacuum-assumptions-td5998294i20.html).
> Any catalog access in any plugins need to interlock with concurrent
> aborts. This is only a problem if the transaction is a prepared one or
> yet uncommitted one. Rest of the majority of the cases, this function
> will do nothing at all.

That doesn't address at all that it's not ok that the output plugin
needs to handle this. Doing this in output plugins, the majority of
which are external projects, means that a) the work needs to be done
many times. b) we can't simply adjust the relevant code in a minor
release, because every output plugin needs to be changed.

> >
> >> + /* if decode_delay is specified, sleep with above lock held */
> >> + if (data->decode_delay > 0)
> >> + {
> >> + elog(LOG, "sleeping for %d seconds", data->decode_delay);
> >> + pg_usleep(data->decode_delay * 1000000L);
> >> + }
> >
> > Really not on board.
> >
>
> Again, specific to test_decoding plugin.

Again, this is not a justification. People look at the code to write
output plugins. Also see my above complaint about this going to be hell
to get right on slow buildfarm members - we're going to crank up the
sleep times to make it robust-ish.

> >> + XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
> >> +
> >
> > Can we perhaps merge a bit of the code with the plain commit path on
> > this?
> >
>
> Given that PREPARE ROLLBACK handling is totally separate from the
> regular commit code paths, wouldn't it be a little difficult?

Why? A helper function doing so ought to be doable.

> >> @@ -1386,8 +1449,20 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
> >> /*
> >> * Validate the GID, and lock the GXACT to ensure that two backends do not
> >> * try to commit the same GID at once.
> >> + *
> >> + * During logical decoding, on the apply side, it's possible that a prepared
> >> + * transaction got aborted while decoding. In that case, we stop the
> >> + * decoding and abort the transaction immediately. However the ROLLBACK
> >> + * prepared processing still reaches the subscriber. In that case it's ok
> >> + * to have a missing gid
> >> */
> >> - gxact = LockGXact(gid, GetUserId());
> >> + gxact = LockGXact(gid, GetUserId(), missing_ok);
> >> + if (gxact == NULL)
> >> + {
> >> + Assert(missing_ok && !isCommit);
> >> + return;
> >> + }
> >
> > I'm very doubtful it is sane to handle this at such a low level.
> >
>
> FinishPreparedTransaction() is called directly from ProcessUtility. If
> not here, where else could we do this?

I don't think this is something that ought to be handled at this layer
at all. You should get an error in that case, the replay logic needs to
handle that, not the low level 2pc code.

> >> @@ -2358,6 +2443,13 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
> >> Assert(TwoPhaseState->numPrepXacts < max_prepared_xacts);
> >> TwoPhaseState->prepXacts[TwoPhaseState->numPrepXacts++] = gxact;
> >>
> >> + if (origin_id != InvalidRepOriginId)
> >> + {
> >> + /* recover apply progress */
> >> + replorigin_advance(origin_id, hdr->origin_lsn, end_lsn,
> >> + false /* backward */ , false /* WAL */ );
> >> + }
> >> +
> >
> > It's unclear to me why this is necessary / a good idea?
> >
>
> Keeping PREPARE handling as close to regular COMMIT handling seems
> like a good idea, no?

But this code *means* something? Explain to me why it's a good idea to
advance, or don't do it.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-02-12 16:25:49 Re: Bogosities in pg_dump's extended statistics support
Previous Message Tom Lane 2018-02-12 16:14:04 Re: Bogosities in pg_dump's extended statistics support