Re: Optionally automatically disable logical replication subscriptions on error

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2021-11-12 04:08:35
Message-ID: CALDaNm0PdzqVSy0uUZ+UKd9rQeQONUjkALj3Ge+fAmdw0cSLgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 11, 2021 at 2:50 PM osumi(dot)takamichi(at)fujitsu(dot)com
<osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
>
> On Wednesday, November 10, 2021 1:23 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> > On Wed, Nov 10, 2021 at 12:26 PM osumi(dot)takamichi(at)fujitsu(dot)com
> > <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> > >
> > > On Monday, November 8, 2021 10:15 PM vignesh C <vignesh21(at)gmail(dot)com>
> > wrote:
> > > > Thanks for the updated patch. Please create a Commitfest entry for
> > > > this. It will help to have a look at CFBot results for the patch,
> > > > also if required rebase and post a patch on top of Head.
> > > As requested, created a new entry for this - [1]
> > >
> > > FYI: the skip xid patch has been updated to v20 in [2] but the v3 for
> > > disable_on_error is not affected by this update and still applicable
> > > with no regression.
> > >
> > > [1] - https://commitfest.postgresql.org/36/3407/
> > > [2] -
> > >
> > https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2B
> > EUHbZ
> > > k8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com
> >
> > I had a look at this patch and have a couple of initial review comments for some
> > issues I spotted:
> Thank you for checking it.
>
>
> > src/backend/commands/subscriptioncmds.c
> > (1) bad array entry assignment
> > The following code block added by the patch assigns
> > "values[Anum_pg_subscription_subdisableonerr - 1]" twice, resulting in it
> > being always set to true, rather than the specified option value:
> >
> > + if (IsSet(opts.specified_opts, SUBOPT_DISABLE_ON_ERR)) {
> > + values[Anum_pg_subscription_subdisableonerr - 1]
> > + = BoolGetDatum(opts.disableonerr);
> > + values[Anum_pg_subscription_subdisableonerr - 1]
> > + = true;
> > + }
> >
> > The 2nd line is meant to instead be
> > "replaces[Anum_pg_subscription_subdisableonerr - 1] = true".
> > (compare to handling for other similar options)
> Oops, fixed.
>
> > src/backend/replication/logical/worker.c
> > (2) unreachable code?
> > In the patch code there seems to be some instances of unreachable code after
> > re-throwing errors:
> >
> > e.g.
> >
> > + /* If we caught an error above, disable the subscription */ if
> > + (disable_subscription) {
> > + ReThrowError(DisableSubscriptionOnError(cctx));
> > + MemoryContextSwitchTo(ecxt);
> > + }
> >
> > + else
> > + {
> > + PG_RE_THROW();
> > + MemoryContextSwitchTo(ecxt);
> > + }
> >
> >
> > + if (disable_subscription)
> > + {
> > + ReThrowError(DisableSubscriptionOnError(cctx));
> > + MemoryContextSwitchTo(ecxt);
> > + }
> >
> > I'm guessing it was intended to do the "MemoryContextSwitch(ecxt);"
> > before re-throwing (?), but it's not really clear, as in the 1st and 3rd cases, the
> > DisableSubscriptionOnError() calls anyway immediately switch the memory
> > context to cctx.
> You are right I think.
> Fixed based on an idea below.
>
> After an error happens, for some additional work
> (e.g. to report the stats of table sync/apply worker
> by pgstat_report_subworker_error() or
> to update the catalog by DisableSubscriptionOnError())
> restore the memory context that is used before the error (cctx)
> and save the old memory context of error (ecxt). Then,
> do the additional work and switch the memory context to the ecxt
> just before the rethrow. As you described,
> in contrast to PG_RE_THROW, DisableSubscriptionOnError() changes
> the memory context immediatedly at the top of it,
> so for this case, I don't call the MemoryContextSwitchTo().
>
> Another important thing as my modification
> is a case when LogicalRepApplyLoop failed and
> apply_error_callback_arg.command == 0. In the original
> patch of skip xid, it just calls PG_RE_THROW()
> but my previous v3 codes missed this macro in this case.
> Therefore, I've fixed this part as well.
>
> C codes are checked by pgindent.
>
> Note that this depends on the v20 skip xide patch in [1]
>

Thanks for the updated patch, Few comments:
1) tab completion should be added for disable_on_error:
/* Complete "CREATE SUBSCRIPTION <name> ... WITH ( <opt>" */
else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
"enabled", "slot_name", "streaming",
"synchronous_commit", "two_phase");

2) disable_on_error is supported by alter subscription, the same
should be documented:
@ -871,11 +886,19 @@ AlterSubscription(ParseState *pstate,
AlterSubscriptionStmt *stmt,
{
supported_opts = (SUBOPT_SLOT_NAME |

SUBOPT_SYNCHRONOUS_COMMIT | SUBOPT_BINARY |
-
SUBOPT_STREAMING);
+
SUBOPT_STREAMING | SUBOPT_DISABLE_ON_ERR);

parse_subscription_options(pstate,
stmt->options,

supported_opts, &opts);

+ if (IsSet(opts.specified_opts,
SUBOPT_DISABLE_ON_ERR))
+ {
+
values[Anum_pg_subscription_subdisableonerr - 1]
+ =
BoolGetDatum(opts.disableonerr);
+
replaces[Anum_pg_subscription_subdisableonerr - 1]
+ = true;
+ }
+

3) Describe subscriptions (dRs+) should include displaying of disableonerr:
\dRs+ sub1
List of subscriptions
Name | Owner | Enabled | Publication | Binary | Streaming | Two
phase commit | Synchronous commit | Conninfo
------+---------+---------+-------------+--------+-----------+------------------+--------------------+---------------------------
sub1 | vignesh | t | {pub1} | f | f | d
| off | dbname=postgres port=5432
(1 row)

4) I felt transicent should be transient, might be a typo:
+ Specifies whether the subscription should be automatically disabled
+ if replicating data from the publisher triggers non-transicent errors
+ such as referential integrity or permissions errors. The default is
+ <literal>false</literal>.

5) The commented use PostgresNode and use TestLib can be removed:
+# Test of logical replication subscription self-disabling feature
+use strict;
+use warnings;
+# use PostgresNode;
+# use TestLib;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More tests => 10;

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-11-12 04:11:01 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message Amit Kapila 2021-11-12 03:43:59 Re: Parallel vacuum workers prevent the oldest xmin from advancing