Re: Perform streaming logical transactions by background workers and parallel apply

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-12 09:21:51
Message-ID: CAHut+Pu660Kv0DX8ZHfkfTAJd+2wq5-QX6VHzV_muDPmOhejHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v20-0004:

(This completes my reviews of the v20* patch set. Sorry, the reviews
are time consuming, so I am lagging slightly behind the latest posted
version)

======

1. doc/src/sgml/ref/create_subscription.sgml

@@ -245,6 +245,11 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
also be the unique column on the publisher-side; 2) there cannot be
any non-immutable functions used by the subscriber-side replicated
table.
+ When applying a streaming transaction, if either requirement is not
+ met, the background worker will exit with an error.
+ The <literal>parallel</literal> mode is disregarded when retrying;
+ instead the transaction will be applied using <literal>on</literal>
+ mode.
</para>

The "on mode" still sounds strange to me. Maybe it's just my personal
opinion, but I don’t really consider 'on' and 'off' to be "modes".
Anyway I already posted the same comment several times before [1,
#4.3]. Let's see what others think.

SUGGESTION
"using on mode" -> "using streaming = on"

======

2. src/backend/replication/logical/worker.c - start_table_sync

@@ -3902,20 +3925,28 @@ start_table_sync(XLogRecPtr *origin_startpos,
char **myslotname)
}
PG_CATCH();
{
+ /*
+ * Emit the error message, and recover from the error state to an idle
+ * state
+ */
+ HOLD_INTERRUPTS();
+
+ EmitErrorReport();
+ AbortOutOfAnyTransaction();
+ FlushErrorState();
+
+ RESUME_INTERRUPTS();
+
+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, false);
+
if (MySubscription->disableonerr)
- DisableSubscriptionAndExit();
- else
- {
- /*
- * Report the worker failed during table synchronization. Abort
- * the current transaction so that the stats message is sent in an
- * idle state.
- */
- AbortOutOfAnyTransaction();
- pgstat_report_subscription_error(MySubscription->oid, false);
+ DisableSubscriptionOnError();

- PG_RE_THROW();
- }
+ /* Set the retry flag. */
+ set_subscription_retry(true);
+
+ proc_exit(0);
}
PG_END_TRY();

Perhaps current code is OK, but I am not 100% sure if we should set
the retry flag when the disable_on_error is set, because the
subscription is not going to be retried (because it is disabled). And
later, if/when the user does enable the subscription, presumably that
will be after they have already addressed the problem that caused the
error/disablement in the first place.

~~~

3. src/backend/replication/logical/worker.c - start_apply

PG_CATCH();
{
+ /*
+ * Emit the error message, and recover from the error state to an idle
+ * state
+ */
+ HOLD_INTERRUPTS();
+
+ EmitErrorReport();
+ AbortOutOfAnyTransaction();
+ FlushErrorState();
+
+ RESUME_INTERRUPTS();
+
+ /* Report the worker failed while applying changes */
+ pgstat_report_subscription_error(MySubscription->oid,
+ !am_tablesync_worker());
+
if (MySubscription->disableonerr)
- DisableSubscriptionAndExit();
- else
- {
- /*
- * Report the worker failed while applying changes. Abort the
- * current transaction so that the stats message is sent in an
- * idle state.
- */
- AbortOutOfAnyTransaction();
- pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
+ DisableSubscriptionOnError();

- PG_RE_THROW();
- }
+ /* Set the retry flag. */
+ set_subscription_retry(true);
}
PG_END_TRY();
}

3a.
Same comment as #2

3b.
This PG_CATCH used to leave by either proc_exit(0) or PG_RE_THROW but
what does it do now? My first impression is there is a bug here due to
some missing code, because AFAICT the exception is caught and gobbled
up and then what...?

~~~

4. src/backend/replication/logical/worker.c - set_subscription_retry

+ if (MySubscription->retry == retry ||
+ am_apply_bgworker())
+ return;

4a.
I this this quick exit can be split and given some appropriate comments

SUGGESTION (for example)
/* Fast path - if no state change then nothing to do */
if (MySubscription->retry == retry)
return;

/* Fast path - skip for apply background workers */
if (am_apply_bgworker())
return;

======

5. .../subscription/t/032_streaming_apply.pl

@@ -78,9 +78,13 @@ my $timer =
IPC::Run::timeout($PostgreSQL::Test::Utils::timeout_default);
my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
on_error_stop => 0);

+# ============================================================================

All those comment highlighting lines like "# ==============" really
belong in the earlier patch (0003 ?) when this TAP test file was
introduced.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPvrw%2BtgCEYGxv%2BnKrqg-zbJdYEXee6o4irPAsYoXcuUcw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-08-12 10:32:35 Re: Support logical replication of DDLs
Previous Message Amit Kapila 2022-08-12 08:47:42 Re: Introduce wait_for_subscription_sync for TAP tests