Quick Links

Re: [HACKERS] logical decoding of two-phase transactions

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Ajin Cherian <itsajin(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] logical decoding of two-phase transactions
Date:	2021-02-26 10:58:26
Message-ID:	CAA4eK1LodEqax+xYOYdqgY5oEM54TdjagA0zT7QjKiC0NRNv=g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Feb 26, 2021 at 9:56 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Feb 25, 2021 at 12:32 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
>
> 5. You need to write/sync the spool file at prepare time because after
> restart between prepare and commit prepared the changes can be lost
> and won't be resent by the publisher assuming there are commits of
> other transactions between prepare and commit prepared. For the same
> reason, I am not sure if we can just rely on the in-memory hash table
> for it (prepare_spoolfile_exists). Sure, if it exists and there is no
> restart then it would be cheap to check in the hash table but I don't
> think it is guaranteed.
>

As we can't rely on the hash table, I think we can get rid of it and
always check if the corresponding file exists.

Few more comments on v43-0007-Fix-apply-worker-empty-prepare
====================================================
1.
+ * So the "table_states_not_ready" list might end up having a READY
+ * state it it even though

The above sentence doesn't sound correct to me.

2.
@@ -759,6 +798,79 @@ apply_handle_begin_prepare(StringInfo s)
{
..
+ */
+ if (!am_tablesync_worker())
+ {

I think here we should have an Assert for tablesync worker because it
should never receive prepare.

3.
+ while (BusyTablesyncs())
+ {
+ elog(DEBUG1, "apply_handle_begin_prepare - waiting for all sync
workers to be DONE/READY");
+
+ process_syncing_tables(begin_data.end_lsn);

..
+ if (begin_data.end_lsn < BiggestTablesyncLSN()

In both the above places, you need to use begin_data.final_lsn because
the prepare is yet not replayed so we can't use its end_lsn for
syncup.

4.
+/*
+ * Are there any tablesyncs which have still not yet reached
SYNCDONE/READY state?
+ */
+bool
+BusyTablesyncs()

The function name is not clear enough. Can we change it to something
like AnyTableSyncInProgress?

5.
+/*
+ * Are there any tablesyncs which have still not yet reached
SYNCDONE/READY state?
+ */
+bool
+BusyTablesyncs()
{
..
+ /*
+ * XXX - When the process_syncing_tables_for_sync changes the state
+ * from SYNCDONE to READY, that change is actually written directly

In the above comment, do you mean to process_syncing_tables_for_apply
because that is where we change state to READY? And, I don't think we
need to mark this comment as XXX.

6.
+ * XXX - Is there a potential timing problem here - e.g. if signal arrives
+ * while executing this then maybe we will set table_states_valid without
+ * refetching them?
+ */
+static void
+FetchTableStates(bool *started_tx)
..

Can you explain which race condition you are worried about here which
is not possible earlier but can happen after this patch?

7.
@@ -941,6 +1162,26 @@ apply_handle_stream_prepare(StringInfo s)
elog(DEBUG1, "received prepare for streamed transaction %u", xid);

/*
+ * Wait for all the sync workers to reach the SYNCDONE/READY state.
+ *
+ * This is same waiting logic as in appy_handle_begin_prepare function
+ * (see that function for more details about this).
+ */
+ if (!am_tablesync_worker())
+ {
+ while (BusyTablesyncs())
+ {
+ process_syncing_tables(prepare_data.end_lsn);
+
+ /* This latch is to prevent 100% CPU looping. */
+ (void) WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+ 1000L, WAIT_EVENT_LOGICAL_SYNC_STATE_CHANGE);
+ ResetLatch(MyLatch);
+ }
+ }

I think we need similar handling in stream_prepare as in begin_prepare
for writing to spool file because this has the same danger. But here
we need to write it xid spool file in StreamXidHash. Another thing we
need to ensure is to sync that file in stream prepare so that it can
survive restarts. Then in apply_handle_commit_prepared, after checking
for prepared spool file, we need to check the existence of xid spool
file, and if the same exists then apply messages from that file.

Again, like begin_prepare, in apply_handle_stream_prepare also we
should have an Assert for table sync worker.

I feel that 2PC and streaming case is a bit complicated to deal with.
How about, for now, we won't allow users to enable streaming if 2PC
option is enabled for Subscription. This requires some change (error
out if both streaming and 2PC options are enabled) in both
createsubscrition and altersubscription but that change should be
fairly small. If we follow this, then in apply_dispatch (for case
LOGICAL_REP_MSG_STREAM_PREPARE), we should report an ERROR "invalid
logical replication message type".

--
With Regards,
Amit Kapila.

In response to

Re: [HACKERS] logical decoding of two-phase transactions at 2021-02-26 04:26:25 from Amit Kapila

Responses

Re: [HACKERS] logical decoding of two-phase transactions at 2021-02-27 02:01:21 from Peter Smith
Re: [HACKERS] logical decoding of two-phase transactions at 2021-03-01 10:10:43 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2021-02-26 11:04:55	Re: Optimising latch signals
Previous Message	Ajin Cherian	2021-02-26 10:42:47	Re: repeated decoding of prepared transactions