Re: Optionally automatically disable logical replication subscriptions on error

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>, "Smith, Peter" <peters(at)fast(dot)au(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optionally automatically disable logical replication subscriptions on error
Date: 2022-03-01 00:49:14
Message-ID: CAHut+Pvfmav1MaGKNZR+PZqCT6wZv1ekTi5E+LuWYFH++ek7rA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please see below my review comments for v22.

======

1. Commit message

"table sync worker" -> "tablesync worker"

~~~

2. doc/src/sgml/catalogs.sgml

+ <para>
+ If true, the subscription will be disabled when subscription
+ workers detect any errors
+ </para></entry>

It felt a bit strange to say "subscription" 2x in the sentence, but I
am not sure how to improve it. Maybe like below?

BEFORE
If true, the subscription will be disabled when subscription workers
detect any errors

SUGGESTED
If true, the subscription will be disabled if one of its workers
detects an error

~~~

3. src/backend/replication/logical/worker.c - DisableSubscriptionOnError

@@ -2802,6 +2803,69 @@ LogicalRepApplyLoop(XLogRecPtr last_received)
}

/*
+ * Disable the current subscription, after error recovery processing.
+ */
+static void
+DisableSubscriptionOnError(void)

I thought the "after error recovery processing" part was a bit generic
and did not really say what it was doing.

BEFORE
Disable the current subscription, after error recovery processing.
SUGGESTED
Disable the current subscription, after logging the error that caused
this function to be called.

~~~

4. src/backend/replication/logical/worker.c - start_apply

+ if (MySubscription->disableonerr)
+ {
+ DisableSubscriptionOnError();
+ return;
+ }
+
+ MemoryContextSwitchTo(ecxt);
+ PG_RE_THROW();
+ }
+ PG_END_TRY();

The current code looks correct, but I felt it is a bit tricky to
easily see the execution path after the return.

Since it will effectively just exit anyhow I think it will be simpler
just to do that explicitly right here instead of the 'return'. This
will also make the code consistent with the same 'disableonerr' logic
of the start_start_sync.

SUGGESTION
if (MySubscription->disableonerr)
{
DisableSubscriptionOnError();
proc_exit(0);
}

~~~

5. src/bin/pg_dump/pg_dump.c

@@ -4463,6 +4473,9 @@ dumpSubscription(Archive *fout, const
SubscriptionInfo *subinfo)
if (strcmp(subinfo->subtwophasestate, two_phase_disabled) != 0)
appendPQExpBufferStr(query, ", two_phase = on");

+ if (strcmp(subinfo->subdisableonerr, "f") != 0)
+ appendPQExpBufferStr(query, ", disable_on_error = true");
+

Although the code is correct, I think it would be more natural to set
this option as true when the user wants it true. e.g. check for "t"
same as 'subbinary' is doing. This way, even if there was some
unknown/corrupted value the code would do nothing, which is the
behaviour you want...

SUGGESTION
if (strcmp(subinfo->subdisableonerr, "t") == 0)

~~~

6. src/include/catalog/pg_subscription.h

@@ -67,6 +67,9 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
BKI_SHARED_RELATION BKI_ROW

char subtwophasestate; /* Stream two-phase transactions */

+ bool subdisableonerr; /* True if occurrence of apply errors
+ * should disable the subscription */

The comment seems not quite right because it's not just about apply
errors. E.g. I think any error in tablesync will cause disablement
too.

BEFORE
True if occurrence of apply errors should disable the subscription
SUGGESTED
True if a worker error should cause the subscription to be disabled

~~~

7. src/test/regress/sql/subscription.sql - whitespace

+-- now it works
+CREATE SUBSCRIPTION regress_testsub CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, disable_on_error = false);
+
+\dRs+
+
+ALTER SUBSCRIPTION regress_testsub SET (disable_on_error = true);
+
+\dRs+
+ALTER SUBSCRIPTION regress_testsub SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_testsub;
+

I think should be a blank line after that last \dRs+ just like the
other one, because it belongs logically with the code above it, not
with the ALTER slot_name.

~~~

8. src/test/subscription/t/028_disable_on_error.pl - filename

The 028 number needs to be bumped because there is already a TAP test
called 028 now

~~~

9. src/test/subscription/t/028_disable_on_error.pl - missing test

There was no test case for the last combination where the user correct
the apply worker problem: E.g. After a previous error/disable of the
subscriber, remove the index, publish the inserts again, and check
they get applied properly.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-01 01:11:03 Re: definition of CalculateMaxmumSafeLSN
Previous Message Masahiko Sawada 2022-03-01 00:46:32 Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN