Re: [HACKERS] logical decoding of two-phase transactions

From: Nikhil Sontakke <nikhils(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>, Sokolov Yura <y(dot)sokolov(at)postgrespro(dot)ru>, Stas Kelvich <s(dot)kelvich(at)postgrespro(dot)ru>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2018-02-28 15:42:42
Message-ID: CAMGcDxeViP+R-OL7QhzUV9eKCVjURobuY1Zijik4Ay_Ddwo4Cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Hi Andres,

>> > First off: This patch has way too many different types of changes as
>> > part of one huge commit. This needs to be split into several
>> > pieces. First the cleanups (e.g. the fields -> flag changes), then the
>> > individual infrastructure pieces (like the twophase.c changes, best
>> > split into several pieces as well, the locking stuff), then the main
>> > feature, then support for it in the output plugin. Each should have an
>> > individual explanation about why the change is necessary and not a bad
>> > idea.
>> >
>>
>> Ok, I will break this patch into multiple logical pieces and re-submit.
>
> Thanks.
>

Attached are 5 patches split up from the original patch that I had
submitted earlier.

ReorderBufferTXN_flags_cleanup_1.patch:
cleanup of the ReorderBufferTXN bools and addition of some new flags
that following patches will need.

Logical_lock_unlock_api_2.patch:
Streaming changes of uncommitted transactions and of prepared
transaction runs the risk of aborts (rollback prepared) happening
while we are decoding. It's not a problem for most transactions, but
some of the transactions which do catalog changes need to get a
consistent view of the metadata so that the decoding does not behave
in uncertain ways when such concurrent aborts occur. We came up with
the concept of a logical locking/unlocking API to safeguard access to
catalog tables. This patch contains the implementation for this
functionality.

2PC_gid_wal_and_2PC_origin_tracking_3.patch:
We now store the 2PC gid in the commit/abort records. This allows us
to send the proper gid to the downstream across restarts. We also want
to avoid receiving the prepared transaction AGAIN from the upstream
and use replorigin tracking across prepared transactions.

reorderbuffer_2PC_logic_4.patch:
Add decoding logic to understand PREPARE related wal records and
relevant changes in the reorderbuffer logic to deal with 2PC. This
includes logic to handle concurrent rollbacks while we are going
through the change buffers belonging to a prepared or uncommitted
transaction.

pgoutput_plugin_support_2PC_5.patch:
Logical protocol changes to apply and send changes via the internal
pgoutput output plugin. Includes test case and relevant documentation
changes.

Besides the above, you had feedback around the test_decoding plugin
and the use of sleep() etc. I will submit a follow-on patch for the
test_decoding plugin stuff soon.

More comments inline below.

>> >> bool only_local;
>> >> + bool twophase_decoding;
>> >> + bool twophase_decode_with_catalog_changes;
>> >> + int decode_delay; /* seconds to sleep after every change record */
>> >
>> > This seems too big a crock to add just for testing. It'll also make the
>> > testing timing dependent...
>> >
>>
>> The idea *was* to make testing timing dependent. We wanted to simulate
>> the case when a rollback is issued by another backend while the
>> decoding is still ongoing. This allows that test case to be tested.
>
> What I mean is that this will be hell on the buildfarm because the
> different animals are differently fast.
>

Will handle this in the test_decoding plugin patch soon.

>
>> >> +/* Filter out unnecessary two-phase transactions */
>> >> +static bool
>> >> +pg_filter_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>> >> + TransactionId xid, const char *gid)
>> >> +{
>> >> + TestDecodingData *data = ctx->output_plugin_private;
>> >> +
>> >> + /* treat all transactions as one-phase */
>> >> + if (!data->twophase_decoding)
>> >> + return true;
>> >> +
>> >> + if (txn && txn_has_catalog_changes(txn) &&
>> >> + !data->twophase_decode_with_catalog_changes)
>> >> + return true;
>> >
>> > What? I'm INCREDIBLY doubtful this is a sane thing to expose to output
>> > plugins. As in, unless I hear a very very convincing reason I'm strongly
>> > opposed.
>> >
>>
>> These bools are specific to the test_decoding plugin.
>

Will handle in the test_decoding plugin patch soon.

> txn_has_catalog_changes() definitely isn't just exposed to
> test_decoding. I think you're making the output plugin interface
> massively more complicated in this patch and I think we need to push
> back on that.
>
>
>> Again, these are useful in testing decoding in various scenarios with
>> twophase decoding enabled/disabled. Testing decoding when catalog
>> changes are allowed/disallowed etc. Please take a look at
>> "contrib/test_decoding/sql/prepared.sql" for the various scenarios.
>
> I don't se ehow that addresses my concern in any sort of way.
>

Will handle in the test_decoding plugin patch soon.

>
>> >> +/*
>> >> + * Check if we should continue to decode this transaction.
>> >> + *
>> >> + * If it has aborted in the meanwhile, then there's no sense
>> >> + * in decoding and sending the rest of the changes, we might
>> >> + * as well ask the subscribers to abort immediately.
>> >> + *
>> >> + * This should be called if we are streaming a transaction
>> >> + * before it's committed or if we are decoding a 2PC
>> >> + * transaction. Otherwise we always decode committed
>> >> + * transactions
>> >> + *
>> >> + * Additional checks can be added here, as needed
>> >> + */
>> >> +static bool
>> >> +pg_filter_decode_txn(LogicalDecodingContext *ctx,
>> >> + ReorderBufferTXN *txn)
>> >> +{
>> >> + /*
>> >> + * Due to caching, repeated TransactionIdDidAbort calls
>> >> + * shouldn't be that expensive
>> >> + */
>> >> + if (txn != NULL &&
>> >> + TransactionIdIsValid(txn->xid) &&
>> >> + TransactionIdDidAbort(txn->xid))
>> >> + return true;
>> >> +
>> >> + /* if txn is NULL, filter it out */
>> >
>> > Why can this be NULL?
>> >
>>
>> Depending on parameters passed to the ReorderBufferTXNByXid()
>> function, the txn might be NULL in some cases, especially during
>> restarts.
>
> That a) isn't an explanation why that's ok b) reasoning why this ever
> needs to be exposed to the output plugin.
>

Removing this pg_filter_decode_txn() function. You are right, there's
no need to expose this function to the output plugin and we can make
the decision entirely inside the ReorderBuffer code handling.

>
>> >> static bool
>> >> pg_decode_filter(LogicalDecodingContext *ctx,
>> >> RepOriginId origin_id)
>> >> @@ -409,8 +622,18 @@ pg_decode_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn,
>> >> }
>> >> data->xact_wrote_changes = true;
>> >>
>> >> + if (!LogicalLockTransaction(txn))
>> >> + return;
>> >
>> > It really really can't be right that this is exposed to output plugins.
>> >
>>
>> This was discussed in the other thread
>> (http://www.postgresql-archive.org/Logical-Decoding-and-HeapTupleSatisfiesVacuum-assumptions-td5998294i20.html).
>> Any catalog access in any plugins need to interlock with concurrent
>> aborts. This is only a problem if the transaction is a prepared one or
>> yet uncommitted one. Rest of the majority of the cases, this function
>> will do nothing at all.
>
> That doesn't address at all that it's not ok that the output plugin
> needs to handle this. Doing this in output plugins, the majority of
> which are external projects, means that a) the work needs to be done
> many times. b) we can't simply adjust the relevant code in a minor
> release, because every output plugin needs to be changed.
>

How do we know if the external project is going to access catalog
data? How do we ensure that the data that they access is safe from
concurrent aborts if we are decoding uncommitted or prepared
transactions? We are providing a guideline here and recommending them
to use these APIs if they need to.

>> >
>> >> + /* if decode_delay is specified, sleep with above lock held */
>> >> + if (data->decode_delay > 0)
>> >> + {
>> >> + elog(LOG, "sleeping for %d seconds", data->decode_delay);
>> >> + pg_usleep(data->decode_delay * 1000000L);
>> >> + }
>> >
>> > Really not on board.
>> >
>>
>> Again, specific to test_decoding plugin.
>
> Again, this is not a justification. People look at the code to write
> output plugins. Also see my above complaint about this going to be hell
> to get right on slow buildfarm members - we're going to crank up the
> sleep times to make it robust-ish.
>

Sure, as mentioned above, will come up with a different way for the
test_decoding plugin later.

>
>
>> >> + XLogSetRecordFlags(XLOG_INCLUDE_ORIGIN);
>> >> +
>> >
>> > Can we perhaps merge a bit of the code with the plain commit path on
>> > this?
>> >
>>
>> Given that PREPARE ROLLBACK handling is totally separate from the
>> regular commit code paths, wouldn't it be a little difficult?
>
> Why? A helper function doing so ought to be doable.
>

Can you elaborate on what exactly you mean here?

>
>
>> >> @@ -1386,8 +1449,20 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
>> >> /*
>> >> * Validate the GID, and lock the GXACT to ensure that two backends do not
>> >> * try to commit the same GID at once.
>> >> + *
>> >> + * During logical decoding, on the apply side, it's possible that a prepared
>> >> + * transaction got aborted while decoding. In that case, we stop the
>> >> + * decoding and abort the transaction immediately. However the ROLLBACK
>> >> + * prepared processing still reaches the subscriber. In that case it's ok
>> >> + * to have a missing gid
>> >> */
>> >> - gxact = LockGXact(gid, GetUserId());
>> >> + gxact = LockGXact(gid, GetUserId(), missing_ok);
>> >> + if (gxact == NULL)
>> >> + {
>> >> + Assert(missing_ok && !isCommit);
>> >> + return;
>> >> + }
>> >
>> > I'm very doubtful it is sane to handle this at such a low level.
>> >
>>
>> FinishPreparedTransaction() is called directly from ProcessUtility. If
>> not here, where else could we do this?
>
> I don't think this is something that ought to be handled at this layer
> at all. You should get an error in that case, the replay logic needs to
> handle that, not the low level 2pc code.
>

Removed the above changes. The replay logic now checks if the GID
still exists in the abort rollback codepath. If not, it returns
immediately. In case of commit rollback replay, the GID has to
obviously exist at the downstream.

>
>> >> @@ -2358,6 +2443,13 @@ PrepareRedoAdd(char *buf, XLogRecPtr start_lsn, XLogRecPtr end_lsn)
>> >> Assert(TwoPhaseState->numPrepXacts < max_prepared_xacts);
>> >> TwoPhaseState->prepXacts[TwoPhaseState->numPrepXacts++] = gxact;
>> >>
>> >> + if (origin_id != InvalidRepOriginId)
>> >> + {
>> >> + /* recover apply progress */
>> >> + replorigin_advance(origin_id, hdr->origin_lsn, end_lsn,
>> >> + false /* backward */ , false /* WAL */ );
>> >> + }
>> >> +
>> >
>> > It's unclear to me why this is necessary / a good idea?
>> >
>>
>> Keeping PREPARE handling as close to regular COMMIT handling seems
>> like a good idea, no?
>
> But this code *means* something? Explain to me why it's a good idea to
> advance, or don't do it.
>

We want to do this to use it as protection against receiving the
prepared tx again.

Other than the above,

*) Changed the flags and added "RB" prefix to all flags and macros.

*) Added a few fields into existing xl_xact_parsed_commit record and avoided
creating an entirely new xl_xact_parsed_prepare record.

Regards,
Nikhils
--
Nikhil Sontakke http://www.2ndQuadrant.com/
PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
pgoutput_plugin_support_2PC_5.patch application/octet-stream 34.0 KB
reorderbuffer_2PC_logic_4.patch application/octet-stream 31.6 KB
2PC_gid_wal_and_2PC_origin_tracking_3.patch application/octet-stream 17.9 KB
ReorderBufferTXN_flags_cleanup_1.patch application/octet-stream 7.8 KB
Logical_lock_unlock_api_2.patch application/octet-stream 18.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-02-28 15:43:43 Re: Kerberos test suite
Previous Message Alex Ignatov 2018-02-28 15:23:04 RE: Direct converting numeric types to bool