RE: Perform streaming logical transactions by background workers and parallel apply

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Date: 2022-08-16 07:37:00
Message-ID: OS3PR01MB62755190D8302CDDFB72C3189E6B9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, August 12, 2022 17:22 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Here are some review comments for v20-0004:
>
> (This completes my reviews of the v20* patch set. Sorry, the reviews
> are time consuming, so I am lagging slightly behind the latest posted
> version)

Thanks for your comments.

> 1. doc/src/sgml/ref/create_subscription.sgml
>
> @@ -245,6 +245,11 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
> also be the unique column on the publisher-side; 2) there cannot be
> any non-immutable functions used by the subscriber-side replicated
> table.
> + When applying a streaming transaction, if either requirement is not
> + met, the background worker will exit with an error.
> + The <literal>parallel</literal> mode is disregarded when retrying;
> + instead the transaction will be applied using <literal>on</literal>
> + mode.
> </para>
>
> The "on mode" still sounds strange to me. Maybe it's just my personal
> opinion, but I don’t really consider 'on' and 'off' to be "modes".
> Anyway I already posted the same comment several times before [1,
> #4.3]. Let's see what others think.
>
> SUGGESTION
> "using on mode" -> "using streaming = on"

Okay, I will follow the relevant comments later.

> 2. src/backend/replication/logical/worker.c - start_table_sync
>
> @@ -3902,20 +3925,28 @@ start_table_sync(XLogRecPtr *origin_startpos,
> char **myslotname)
> }
> PG_CATCH();
> {
> + /*
> + * Emit the error message, and recover from the error state to an idle
> + * state
> + */
> + HOLD_INTERRUPTS();
> +
> + EmitErrorReport();
> + AbortOutOfAnyTransaction();
> + FlushErrorState();
> +
> + RESUME_INTERRUPTS();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, false);
> +
> if (MySubscription->disableonerr)
> - DisableSubscriptionAndExit();
> - else
> - {
> - /*
> - * Report the worker failed during table synchronization. Abort
> - * the current transaction so that the stats message is sent in an
> - * idle state.
> - */
> - AbortOutOfAnyTransaction();
> - pgstat_report_subscription_error(MySubscription->oid, false);
> + DisableSubscriptionOnError();
>
> - PG_RE_THROW();
> - }
> + /* Set the retry flag. */
> + set_subscription_retry(true);
> +
> + proc_exit(0);
> }
> PG_END_TRY();
>
> Perhaps current code is OK, but I am not 100% sure if we should set
> the retry flag when the disable_on_error is set, because the
> subscription is not going to be retried (because it is disabled). And
> later, if/when the user does enable the subscription, presumably that
> will be after they have already addressed the problem that caused the
> error/disablement in the first place.

I think it is okay. Because even after addressing the problem, it is also
*retrying* to apply the failed transaction. And, in the worst case, it just
applies the first failed streaming transaction using "on" mode instead of
"parallel" mode.

> 3. src/backend/replication/logical/worker.c - start_apply
>
> PG_CATCH();
> {
> + /*
> + * Emit the error message, and recover from the error state to an idle
> + * state
> + */
> + HOLD_INTERRUPTS();
> +
> + EmitErrorReport();
> + AbortOutOfAnyTransaction();
> + FlushErrorState();
> +
> + RESUME_INTERRUPTS();
> +
> + /* Report the worker failed while applying changes */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> +
> if (MySubscription->disableonerr)
> - DisableSubscriptionAndExit();
> - else
> - {
> - /*
> - * Report the worker failed while applying changes. Abort the
> - * current transaction so that the stats message is sent in an
> - * idle state.
> - */
> - AbortOutOfAnyTransaction();
> - pgstat_report_subscription_error(MySubscription-
> >oid, !am_tablesync_worker());
> + DisableSubscriptionOnError();
>
> - PG_RE_THROW();
> - }
> + /* Set the retry flag. */
> + set_subscription_retry(true);
> }
> PG_END_TRY();
> }
>
> 3a.
> Same comment as #2
>
> 3b.
> This PG_CATCH used to leave by either proc_exit(0) or PG_RE_THROW but
> what does it do now? My first impression is there is a bug here due to
> some missing code, because AFAICT the exception is caught and gobbled
> up and then what...?

=>3a.
See the reply to #2.
=>3b.
The function `proc_exit(0)` is invoked after invoking function start_apply. See
function ApplyWorkerMain.

> 4. src/backend/replication/logical/worker.c - set_subscription_retry
>
> + if (MySubscription->retry == retry ||
> + am_apply_bgworker())
> + return;
>
> 4a.
> I this this quick exit can be split and given some appropriate comments
>
> SUGGESTION (for example)
> /* Fast path - if no state change then nothing to do */
> if (MySubscription->retry == retry)
> return;
>
> /* Fast path - skip for apply background workers */
> if (am_apply_bgworker())
> return;

Changed.

> 5. .../subscription/t/032_streaming_apply.pl
>
> @@ -78,9 +78,13 @@ my $timer =
> IPC::Run::timeout($PostgreSQL::Test::Utils::timeout_default);
> my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
> on_error_stop => 0);
>
> +#
> =============================================================
> ===============
>
> All those comment highlighting lines like "# ==============" really
> belong in the earlier patch (0003 ?) when this TAP test file was
> introduced.

Changed.

The new patches were attached in [1].

[1] - https://www.postgresql.org/message-id/OS3PR01MB6275739E73E8BEC5D13FB6739E6B9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Drouvot, Bertrand 2022-08-16 08:01:13 Re: Patch proposal: New hooks in the connection path
Previous Message wangw.fnst@fujitsu.com 2022-08-16 07:33:04 RE: Perform streaming logical transactions by background workers and parallel apply