Re: Handle infinite recursion in logical replication setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-08-30 17:05:30
Message-ID: CALDaNm1V+AvzPsUcq=mNYG+JfAyovTWBe4vWKTknOgH_ko1E0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 29 Aug 2022 at 11:59, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some review comments for patch v42-0001:
>
> ======
>
> 1. Commit message
>
> A later review comment below suggests some changes to the WARNING
> message so if those changes are made then the example in this commit
> message also needs to be modified.

Modified

> ======
>
> 2. doc/src/sgml/ref/alter_subscription.sgml - copy_data
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> I think saying "usage of true" sounded a bit strange.
>
> SUGGESTION
> Refer to the Notes about how copy_data = true can interact with the
> origin parameter.

Modified

> ======
>
> 3. doc/src/sgml/ref/create_subscription.sgml - copy data
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> SUGGESTION
> (same as #2)

Modified

> ~~
>
> 4. doc/src/sgml/ref/create_subscription.sgml - origin
>
> Refer to the Notes for the usage of true for copy_data parameter and
> its interaction with the origin parameter.
>
> SUGGESTION
> (same as #2)

Modified

> ~~~
>
> 5. doc/src/sgml/ref/create_subscription.sgml - Notes
>
> + <para>
> + If the subscription is created with <literal>origin = none</literal> and
> + <literal>copy_data = true</literal>, it will check if the publisher has
> + subscribed to the same table from other publishers and, if so, throw a
> + warning to notify user to check the publisher tables. The user can ensure
>
> "throw a warning to notify user" -> "log a warning to notify the user"

Modified

> ~~
>
> 6.
>
> + warning to notify user to check the publisher tables. The user can ensure
> + that publisher tables do not have data which has an origin associated before
> + continuing with any other operations to prevent inconsistent data being
> + replicated.
> + </para>
>
> 6a.
>
> I'm not so sure about this. IMO the warning is not about the
> replication – really it is about the COPY which has already happened
> anyway. So the user can't really prevent anything from going wrong;
> instead, they have to take some action to clean up if anything did go
> wrong.
>
> SUGGESTION
> Before continuing with other operations the user should check that
> publisher tables did not have data with different origins, otherwise
> inconsistent data may have been copied.

Modified based on the suggestion provided by Amit.

> ~
>
> 6b.
>
> I am also wondering what can the user do now. Assuming there was bad
> COPY then the subscriber table has already got unwanted stuff copied
> into it. Is there any advice we can give to help users fix this mess?

There is nothing much a user can do in this case. Only option would be
to take a backup before the operation and restore it and then recreate
the replication setup. I was not sure if we should document these
steps as similar inconsistent data could get created without the
origin option . Thoughts?

> ======
>
> 7. src/backend/commands/subscriptioncmds.c - AlterSubscription_refresh
>
> + /* Check whether we can allow copy of newly added relations. */
> + check_pub_table_subscribed(wrconn, sub->publications, copy_data,
> + sub->origin, subrel_local_oids,
> + subrel_count);
>
> "whether we can allow" seems not quite the right wording here anymore,
> because now there is no ERROR stopping this - so if there was unwanted
> data the COPY will proceed to copy it anyhow...

Removed the comment as the function details the necessary things.

> ~~~
>
> 8. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> @@ -1781,6 +1794,121 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
> table_close(rel, RowExclusiveLock);
> }
>
> +/*
> + * Check and throw a warning if the publisher has subscribed to the same table
> + * from some other publisher. This check is required only if "copy_data = true"
> + * and "origin = none" for CREATE SUBSCRIPTION and
> + * ALTER SUBSCRIPTION ... REFRESH statements to notify user that data having
> + * origin might have been copied.
>
> "throw a warning" -> "log a warning"
>
> "to notify user" -> "to notify the user" ?

Modified

> ~~~
>
> 9.
>
> + if (copydata == false || !origin ||
> + (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))
> + return;
>
> SUGGESTION
> if (!copydata || !origin ||
> (pg_strcasecmp(origin, LOGICALREP_ORIGIN_NONE) != 0))

Modified

> ~~~
>
> 10. (Question)
>
> I can't tell just by reading the code if FOR ALL TABLES case is
> handled – e.g. will this recognise the case were the publisher might
> have table data from other origins because it has a subscription on
> some other node that was publishing "FOR ALL TABLES"?

Yes it handles it.

> ~~~
>
> 11.
>
> + ereport(WARNING,
> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("publisher has subscribed table \"%s.%s\" from some other publisher",
> + nspname, relname),
> + errdetail("Publisher might have subscribed one or more tables from
> some other publisher."),
> + errhint("Verify that these publisher tables do not have data that
> has an origin associated before proceeding to avoid inconsistency."));
> + break;
>
> 11a.
> I'm not sure having that "break" code is correct logic. Won't that
> mean the user will only get a warning for the first potential problem
> encountered but then other potential problems tables will have no
> warnings at all so the user may not be made aware of them? Perhaps you
> need to first gather the list of all the suspicious tables before
> logging a single warning which shows that list? Or perhaps you need to
> log multiple warnings – one warning per suspicious table?

Modified to get the list of all the tables, I have limited it to
display 100 tables followed by ... to indicate there might be more
tables. I did not want to loga warning message with many 1000's of
tables which will not help the readability of the message.

> ~
>
> 11b.
> The WARNING message seems a bit inside-out because the errmsg is
> giving more details than the errdetail.
>
> SUGGESTIONS (or something similar)
> errmsg - "subscription XXX requested origin=NONE but may have copied
> data that had a different origin."
> errdetail – Publisher YYY has subscribed table \"%s.%s\" from some
> other publisher"

We don't have the tables based on each publisher, it is for all the
publishers. I have changed the errdetail message slightly.

> ~
>
> 11c.
> The errhint sentence is unusual.
>
> BEFORE
> "Verify that these publisher tables do not have data that has an
> origin associated before proceeding to avoid inconsistency."
>
> SUGGESTION (or something similar)
> Before proceeding, verify that initial data copied from the publisher
> tables did not come from other origins.

Modified

> ======
>
> 12. src/test/subscription/t/030_origin.pl
>
> +# have remotely originated data from node_A. We throw a warning, in this case,
> +# to draw attention to there being possible remote data.
>
> "throw a warning" -> "log a warning" (this occurs 2x)

Modified

The attached v43 patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v43-0001-Check-and-log-a-warning-if-the-publisher-has-sub.patch application/octet-stream 20.4 KB
v43-0002-Document-the-steps-for-replication-between-prima.patch application/octet-stream 19.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-08-30 17:11:02 Re: Handle infinite recursion in logical replication setup
Previous Message Simon Riggs 2022-08-30 16:45:54 Re: SUBTRANS: Minimizing calls to SubTransSetParent()