RE: Optionally automatically disable logical replication subscriptions on error

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Greg Nancarrow' <gregn4422(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Optionally automatically disable logical replication subscriptions on error
Date: 2021-11-11 09:20:44
Message-ID: TYCPR01MB8373D8A622ED51CDEF3DAF5DED949@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, November 10, 2021 1:23 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
> On Wed, Nov 10, 2021 at 12:26 PM osumi(dot)takamichi(at)fujitsu(dot)com
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> >
> > On Monday, November 8, 2021 10:15 PM vignesh C <vignesh21(at)gmail(dot)com>
> wrote:
> > > Thanks for the updated patch. Please create a Commitfest entry for
> > > this. It will help to have a look at CFBot results for the patch,
> > > also if required rebase and post a patch on top of Head.
> > As requested, created a new entry for this - [1]
> >
> > FYI: the skip xid patch has been updated to v20 in [2] but the v3 for
> > disable_on_error is not affected by this update and still applicable
> > with no regression.
> >
> > [1] - https://commitfest.postgresql.org/36/3407/
> > [2] -
> >
> https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2B
> EUHbZ
> > k8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com
>
> I had a look at this patch and have a couple of initial review comments for some
> issues I spotted:
Thank you for checking it.

> src/backend/commands/subscriptioncmds.c
> (1) bad array entry assignment
> The following code block added by the patch assigns
> "values[Anum_pg_subscription_subdisableonerr - 1]" twice, resulting in it
> being always set to true, rather than the specified option value:
>
> + if (IsSet(opts.specified_opts, SUBOPT_DISABLE_ON_ERR)) {
> + values[Anum_pg_subscription_subdisableonerr - 1]
> + = BoolGetDatum(opts.disableonerr);
> + values[Anum_pg_subscription_subdisableonerr - 1]
> + = true;
> + }
>
> The 2nd line is meant to instead be
> "replaces[Anum_pg_subscription_subdisableonerr - 1] = true".
> (compare to handling for other similar options)
Oops, fixed.

> src/backend/replication/logical/worker.c
> (2) unreachable code?
> In the patch code there seems to be some instances of unreachable code after
> re-throwing errors:
>
> e.g.
>
> + /* If we caught an error above, disable the subscription */ if
> + (disable_subscription) {
> + ReThrowError(DisableSubscriptionOnError(cctx));
> + MemoryContextSwitchTo(ecxt);
> + }
>
> + else
> + {
> + PG_RE_THROW();
> + MemoryContextSwitchTo(ecxt);
> + }
>
>
> + if (disable_subscription)
> + {
> + ReThrowError(DisableSubscriptionOnError(cctx));
> + MemoryContextSwitchTo(ecxt);
> + }
>
> I'm guessing it was intended to do the "MemoryContextSwitch(ecxt);"
> before re-throwing (?), but it's not really clear, as in the 1st and 3rd cases, the
> DisableSubscriptionOnError() calls anyway immediately switch the memory
> context to cctx.
You are right I think.
Fixed based on an idea below.

After an error happens, for some additional work
(e.g. to report the stats of table sync/apply worker
by pgstat_report_subworker_error() or
to update the catalog by DisableSubscriptionOnError())
restore the memory context that is used before the error (cctx)
and save the old memory context of error (ecxt). Then,
do the additional work and switch the memory context to the ecxt
just before the rethrow. As you described,
in contrast to PG_RE_THROW, DisableSubscriptionOnError() changes
the memory context immediatedly at the top of it,
so for this case, I don't call the MemoryContextSwitchTo().

Another important thing as my modification
is a case when LogicalRepApplyLoop failed and
apply_error_callback_arg.command == 0. In the original
patch of skip xid, it just calls PG_RE_THROW()
but my previous v3 codes missed this macro in this case.
Therefore, I've fixed this part as well.

C codes are checked by pgindent.

Note that this depends on the v20 skip xide patch in [1]

[1] - https://www.postgresql.org/message-id/CAD21AoAT42mhcqeB1jPfRL1%2BEUHbZk8MMY_fBgsyZvJeKNpG%2Bw%40mail.gmail.com

Best Regards,
Takamichi Osumi

Attachment Content-Type Size
v4-0001-Optionally-disabling-subscriptions-on-error.patch application/octet-stream 22.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-11-11 09:35:32 RE: [BUG]Invalidate relcache when setting REPLICA IDENTITY
Previous Message wenjing 2021-11-11 08:15:09 Re: [Proposal] Global temporary tables