Re: [HACKERS] logical decoding of two-phase transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] logical decoding of two-phase transactions
Date: 2021-05-31 11:16:27
Message-ID: CAA4eK1Jd9sqWtt5kEJZL1ehJB2y_DFnvDjY9vJ51k8Wq6XWVyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 28, 2021 at 11:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> One minor comment for 0001.
> * Special case: if when tables were specified but copy_data is
> + * false then it is safe to enable two_phase up-front because
> + * those tables are already initially READY state. Note, if
> + * the subscription has no tables then enablement cannot be
> + * done here - we must leave the twophase state as PENDING, to
> + * allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to work.
>
> Can we slightly modify this comment as: "Note that if tables were
> specified but copy_data is false then it is safe to enable two_phase
> up-front because those tables are already initially READY state. When
> the subscription has no tables, we leave the twophase state as
> PENDING, to allow ALTER SUBSCRIPTION ... REFRESH PUBLICATION to work."
>
> Also, I don't see any test after you enable this special case. Is it
> covered by existing tests, if not then let's try to add a test for
> this?
>

I see that Ajin's latest patch has addressed the other comments except
for the above test case suggestion. I have again reviewed the first
patch and have some comments.

Comments on v81-0001-Add-support-for-prepared-transactions-to-built-i
============================================================================
1.
<para>
The logical replication solution that builds distributed two
phase commit
using this feature can deadlock if the prepared transaction has locked
- [user] catalog tables exclusively. They need to inform users to not have
- locks on catalog tables (via explicit <command>LOCK</command>
command) in
- such transactions.
+ [user] catalog tables exclusively. To avoid this users must refrain from
+ having locks on catalog tables (via explicit
<command>LOCK</command> command)
+ in such transactions.
</para>

This change doesn't belong to this patch. I see the proposed text
could be considered as an improvement but still we can do this
separately. We are already trying to improve things in this regard in
the thread [1], so you can propose this change there.

2.
+<varlistentry>
+<term>Byte1('K')</term>
+<listitem><para>
+ Identifies the message as the commit of a two-phase
transaction message.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int8</term>
+<listitem><para>
+ Flags; currently unused (must be 0).
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
+ The LSN of the commit.
+</para></listitem>
+</varlistentry>
+
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
+ The end LSN of the commit transaction.
+</para></listitem>
+</varlistentry>

Can we change the description of LSN's as "The LSN of the commit
prepared." and "The end LSN of the commit prepared transaction."
respectively? This will make their description different from regular
commit and I think that defines them better.

3.
+<varlistentry>
+<term>Int64</term>
+<listitem><para>
+ The end LSN of the rollback transaction.
+</para></listitem>
+</varlistentry>

Similar to above, can we change the description here as: "The end LSN
of the rollback prepared transaction."?

4.
+ * The exception to this restriction is when copy_data =
+ * false, because when copy_data is false the tablesync will
+ * start already in READY state and will exit directly without
+ * doing anything which could interfere with the apply
+ * worker's message handling.
+ *
+ * For more details see comments atop worker.c.
+ */
+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && copy_data)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("ALTER SUBSCRIPTION ... REFRESH with copy_data is not allowed
when two_phase is enabled"),
+ errhint("Use ALTER SUBSCRIPTION ... REFRESH with copy_data = false"
+ ", or use DROP/CREATE SUBSCRIPTION.")));

The above comment is a bit unclear because it seems you are saying
there is some problem even when copy_data is false. Are you missing
'not' after 'could' in the comment?

5.
XXX Now, this can even lead to a deadlock if the prepare
* transaction is waiting to get it logically replicated for
- * distributed 2PC. Currently, we don't have an in-core
- * implementation of prepares for distributed 2PC but some
- * out-of-core logical replication solution can have such an
- * implementation. They need to inform users to not have locks
- * on catalog tables in such transactions.
+ * distributed 2PC. This can be avoided by disallowing to
+ * prepare transactions that have locked [user] catalog tables
+ * exclusively.

Can we slightly modify this part of the comment as: "This can be
avoided by disallowing to prepare transactions that have locked [user]
catalog tables exclusively but as of now we ask users not to do such
operation"?

6.
+AllTablesyncsReady(void)
+{
+ bool found_busy = false;
+ bool started_tx = false;
+ bool has_subrels = false;
+
+ /* We need up-to-date sync state info for subscription tables here. */
+ has_subrels = FetchTableStates(&started_tx);
+
+ found_busy = list_length(table_states_not_ready) > 0;
+
+ if (started_tx)
+ {
+ CommitTransactionCommand();
+ pgstat_report_stat(false);
+ }
+
+ /*
+ * When there are no tables, then return false.
+ * When no tablesyncs are busy, then all are READY
+ */
+ return has_subrels && !found_busy;
+}

Do we really need found_busy variable in above function. Can't we
change the return as (has_subrels) && (table_states_not_ready != NIL)?
If so, then change the comments above return.

7.
+/*
+ * Common code to fetch the up-to-date sync state info into the static lists.
+ *
+ * Returns true if subscription has 1 or more tables, else false.
+ */
+static bool
+FetchTableStates(bool *started_tx)

Can we update comments indicating that if this function starts the
transaction then the caller is responsible to commit it?

8.
(errmsg("logical replication apply worker for subscription \"%s\" will
restart so two_phase can be enabled",
+ MySubscription->name)));

Can we slightly change the message as: "logical replication apply
worker for subscription \"%s\" will restart so that two_phase can be
enabled"?

9.
+void
+UpdateTwoPhaseState(Oid suboid, char new_state)
{
..
+ /* And update/set two_phase ENABLED */
+ values[Anum_pg_subscription_subtwophasestate - 1] = CharGetDatum(new_state);
+ replaces[Anum_pg_subscription_subtwophasestate - 1] = true;
..
}

The above comment seems wrong to me as we are updating the state as
passed by the caller.

[1] - https://www.postgresql.org/message-id/20210222222847.tpnb6eg3yiykzpky%40alap3.anarazel.de

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-05-31 11:40:50 Re: Skipping logical replication transactions on subscriber side
Previous Message Dilip Kumar 2021-05-31 10:59:51 Re: Decoding speculative insert with toast leaks memory