Re: repeated decoding of prepared transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Markus Wanner <markus(dot)wanner(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>
Subject: Re: repeated decoding of prepared transactions
Date: 2021-02-11 10:36:50
Message-ID: CAA4eK1L5aX1BL9Xg-wSULbFeB417G0v9qk5qZ6NbYCkCo6JUGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 8, 2021 at 2:01 PM Markus Wanner
<markus(dot)wanner(at)enterprisedb(dot)com> wrote:
>
> I did not expect this, as any receiver that wants to have decoded 2PC is
> likely supporting some kind of two-phase commits itself. And would
> therefore prepare the transaction upon its first reception. Potentially
> receiving it a second time would require complicated filtering on every
> prepared transaction.
>

I would like to bring one other scenario to your notice where you
might want to handle things differently for prepared transactions on
the plugin side. Assume we have multiple publications (for simplicity
say 2) on publisher with corresponding subscriptions (say 2, each
corresponding to one publication on the publisher). When a user
performs a transaction on a publisher that involves the tables from
both publications, on the subscriber-side, we do that work via two
different transactions, corresponding to each subscription. But, we
need some way to deal with prepared xacts because they need GID and we
can't use the same GID for both subscriptions. Please see the detailed
example and one idea to deal with the same in the main thread[1]. It
would be really helpful if you or others working on the plugin side
can share your opinion on the same.

Now, coming back to the restart case where the prepared transaction
can be sent again by the publisher. I understand yours and others
point that we should not send prepared transaction if there is a
restart between prepare and commit but there are reasons why we have
done that way and I am open to your suggestions. I'll once again try
to explain the exact case to you which is not very apparent. The basic
idea is that we ship/replay all transactions where commit happens
after the snapshot has a consistent state (SNAPBUILD_CONSISTENT), see
atop snapbuild.c for details. Now, for transactions where prepare is
before snapshot state SNAPBUILD_CONSISTENT and commit prepared is
after SNAPBUILD_CONSISTENT, we need to send the entire transaction
including prepare at the commit time. One might think it is quite easy
to detect that, basically if we skip prepare when the snapshot state
was not SNAPBUILD_CONSISTENT, then mark a flag in ReorderBufferTxn and
use the same to detect during commit and accordingly take the decision
to send prepare but unfortunately it is not that easy. There is always
a chance that on restart we reuse the snapshot serialized by some
other Walsender at a location prior to Prepare and if that happens
then this time the prepare won't be skipped due to snapshot state
(SNAPBUILD_CONSISTENT) but due to start_decodint_at point (considering
we have already shipped some of the later commits but not prepare).
Now, this will actually become the same situation where the restart
has happened after we have sent the prepare but not commit. This is
the reason we have to resend the prepare when the subscriber restarts
between prepare and commit.

You can reproduce the case where we can't distinguish between two
situations by using the test case in twophase_snapshot.spec and
additionally starting a separate session via the debugger. So, the
steps in the test case are as below:

"s2b" "s2txid" "s1init" "s3b" "s3txid" "s2c" "s2b" "s2insert" "s2p"
"s3c" "s1insert" "s1start" "s2cp" "s1start"

Define new steps as

"s4init" {SELECT 'init' FROM
pg_create_logical_replication_slot('isolation_slot_1',
'test_decoding');}
"s4start" {SELECT data FROM
pg_logical_slot_get_changes('isolation_slot_1', NULL, NULL,
'include-xids', 'false', 'skip-empty-xacts', '1', 'two-phase-commit',
'1');}

The first thing we need to do is s4init and stop the debugger in
SnapBuildProcessRunningXacts. Now perform steps from 's2b' till first
's1start' in twophase_snapshot.spec. Then continue in the s4 session
and perform s4start. After this, if you debug (or add the logs) the
second s1start, you will notice that we are skipping prepare not
because of inconsistent snapshot but a forward location in
start_decoding_at. If you don't involve session-4, then it will always
skip prepare due to an inconsistent snapshot state. This involves a
debugger so not easy to write an automated test for it.

I have used a bit tricky scenario to explain this but not sure if
there was any other simpler way.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BLvkeX%3DB3xon7RcBwD4CVaFSryPj3pTBAALrDxQVPDwA%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-02-11 11:06:55 Re: Is Recovery actually paused?
Previous Message Petr Jelinek 2021-02-11 10:02:31 Re: Single transaction in the tablesync worker?