Re: Handle infinite recursion in logical replication setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-04-28 17:04:12
Message-ID: CALDaNm2hRyXc4DKPLnBgem_LHBmy=WXFEsm9-xhckye1YXcpPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 27, 2022 at 11:15 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are my review comments for v10-0001 and v10-0002.
>
> (I did not yet look at the v10-0002 TAP tests. I will check those
> separately later).
>
> =================
> v10-0001 comments
> =================
>
> 1.1 src/include/catalog/pg_subscription.h
>
> @@ -70,6 +70,8 @@ CATALOG(pg_subscription,6100,SubscriptionRelationId)
> BKI_SHARED_RELATION BKI_ROW
>
> bool substream; /* Stream in-progress transactions. */
>
> + bool sublocalonly; /* skip copying of remote origin data */
> +
> char subtwophasestate; /* Stream two-phase transactions */
>
> bool subdisableonerr; /* True if a worker error should cause the
>
> 1.1.a Comment should start uppercase.

Modified

> 1.1.b Perhaps it is better to say "Skip replication of" instead of
> "Skip copying" ?

Modified

> ~~~
>
> 1.2 src/include/catalog/pg_subscription.h
>
> @@ -110,6 +112,7 @@ typedef struct Subscription
> bool binary; /* Indicates if the subscription wants data in
> * binary format */
> bool stream; /* Allow streaming in-progress transactions. */
> + bool local_only; /* Skip copying of remote origin data */
> char twophasestate; /* Allow streaming two-phase transactions */
>
> Same comment as #1.1

Modified

> =================
> v10-0002 comments
> =================
>
> 2.1 Commit message
>
> b) user wil have to create another subscription subscribing from
> Node2 to Node3 using local_only option and copy_data as true.
>
> Typo: "wil"

Modified

> ~~~
>
> 2.2 Commit message
>
> Here when user has specified local_only 'on' which indicates that the
> publisher should only replicate the changes that are generated locally, but in
> this case since the publisher node is also subscribing data from other nodes,
> the publisher node can have remotely originated data, so throw an error in this
> case to prevent remotely generated data being replicated to the subscriber. If
> user still intends to continue with the operation user can specify copy_data
> as 'force' and proceed.
>
> SUGGESTED (minor rewording)
> In the scenario above the user has specified local_only 'on' (which
> indicates that the publisher should only replicate the changes that
> are generated locally), but in this case, the publisher node is also
> subscribing data from other nodes, so the publisher node may have
> remotely originated data. We throw an error, in this case, to draw
> attention to there being possible remote data. If the user still
> wishes to continue with the operation user can specify copy_data as
> 'force' and proceed.

Modified

> ~~~
>
> 2.3 doc/src/sgml/logical-replication.sgml
>
> + <para>
> + Bidirectional replication is useful in creating multi master database
> + which helps in performing read/write operations from any of the nodes.
> + Setting up bidirectional logical replication between two nodes requires
> + creation of the publication in all the nodes, creating subscription in
> + each of the nodes that subcribes to data from all the nodes. The steps
> + to create two node bidirectional replication is listed below:
> + </para>
>
> 2.3.a Typo: "subcribes"

Modified

> 2.3.b Wording: "creating subscription” -> "and creating subscriptions..."

Modified

> 2.3.c Wording: "The steps to create two node bidirectional replication
> is listed below:" -> "The steps to create a two-node bidirectional
> replication are given below:"

Modified

> ~~~
>
> 2.4 doc/src/sgml/logical-replication.sgml
>
> +<programlisting>
> +node2=# CREATE SUBSCRIPTION sub_node1_node2
> +node2=# CONNECTION 'dbname=foo host=node1 user=repuser'
> +node2=# PUBLICATION pub_node1
> +node2=# WITH (copy_data = off, local_only = on);
> +CREATE SUBSCRIPTION
> +</programlisting>
>
> I am not sure if those psql continuation prompts are right. Shouldn't
> they be "node2-#" instead of "node2=#"?

Modified

> ~~~
>
> 2.5 doc/src/sgml/logical-replication.sgml
>
> +<programlisting>
> +node2=# CREATE SUBSCRIPTION sub_node1_node2
> +node2=# CONNECTION 'dbname=foo host=node1 user=repuser'
> +node2=# PUBLICATION pub_node1
> +node2=# WITH (copy_data = off, local_only = on);
> +CREATE SUBSCRIPTION
> +</programlisting>
> + </para>
>
> IIUC the closing </para> should be on the same line as the
> </programlisting>. I recall there was some recent github push about
> this sometime in the last month - maybe you can check to confirm it.

I have seen the programlisting across multiple places and noticed
that there is no such guideline being followed. I have not made any
change for this comment.

> ~~~
>
> 2.6 doc/src/sgml/logical-replication.sgml
>
> + <para>
> + Create the subscription in node2 to subscribe the changes from node1:
> +<programlisting>
> +node2=# CREATE SUBSCRIPTION sub_node1_node2
>
> IMO the naming convention here is backwards/confusing. E.g. The
> "pub_node1" is the publisher at node1. So similarly, I think the
> subscriber at node2 should be called "sub_node2_node1" (not
> "sub_node1_node2")

Modified

> ~~~
>
> 2.7 doc/src/sgml/logical-replication.sgml
>
> + <para>
> + Adding a new node node3 to the existing node1 and node2 requires setting
> + up subscription in node1 and node2 to replicate the data from node3 and
> + setting up subscription in node3 to replicate data from node1 and node2.
> + The steps for the same is listed below:
> + </para>
>
> There are several sections that say "The steps for the same is listed
> below:". The sentence for all of them seemed redundant to me.

Removed it.

> ~~~
>
> 2.8 doc/src/sgml/logical-replication.sgml
>
> + <para>
> + Adding a new node node3 to the existing node1 and node2 when data is
> + present in existing nodes node1 and node2 needs similar steps, only change
> + required here is that node3 should create subscription with copy_data as
> + force to one of the existing nodes to receive the existing data during
> + initial data synchronization. The steps for the same is listed below:
> + </para>
>
> I thought it should be 2 sentences. So "needs similar steps, only
> change required here" -> "needs similar steps. The only change
> required here..."

Modified

> ~~~
>
> 2.9 doc/src/sgml/logical-replication.sgml
>
> I think every time you say option names or values like "copy_data" in
> the text they should be using <literal> font.

Modified

> ~~~
>
> 2.10 doc/src/sgml/logical-replication.sgml
>
> + <para>
> + Adding a new node node3 to the existing node1 and node2 when data is
> + present in the new node node3 needs similar steps, few changes are
> + required here to get the existing data from node3 to node1 and node2 and
> + later cleaning up of data in node3 before synchronization of all the data
> + from the existing nodes. The steps for the same is listed below:
> + </para>
>
> I thought it should be 2 sentences. So "needs similar steps, few
> changes are required here" -> "needs similar steps. A few changes are
> required here..."

Modified

> ~~~
>
> 2.11 doc/src/sgml/logical-replication.sgml
>
> All the text that says "create subscription" and "create publication"
> maybe should be change to "create a subscription" and "create a
> publication" etc.

Modified

> ~~~
>
> 2.12 doc/src/sgml/ref/alter_subscription.sgml - copy_data
>
> + <para>
> + There is some interaction between the "local_only" option and
> + "copy_data" option. Refer to the
> + <xref linkend="sql-createsubscription-notes" /> for interaction
> + details and usage of force for copy_data option.
> </para>
>
> 2.12.a Maybe options should not be in quotes like that. I think they
> should be <literal> font.

Modified

> 2.12.b It is a bit misleading because there is no "Notes" section here
> on this page. Maybe it should say refer to the CREATE SUBSCRIPTION
> Notes.

Modified

> 2.12.c The last copy_data should also be <literal> font.

Modified

> ~~~
>
> 2.13 doc/src/sgml/ref/create_subscription.sgml - local_only
>
> </para>
> + <para>
> + There is some interaction between the "local_only" option and
> + "copy_data" option. Refer to the
> + <xref linkend="sql-createsubscription-notes" /> for details.
> + </para>
> </listitem>
>
> 2.13.a Maybe options should not be in quotes like that. I think they
> should be <literal> font.

Modified

> 2.13.b The last copy_data should also be <literal> font.

Modified

> ~~~
>
> 2.14 doc/src/sgml/ref/create_subscription.sgml - copy_data
>
>
> <para>
> Specifies whether to copy pre-existing data in the publications
> - that are being subscribed to when the replication starts.
> - The default is <literal>true</literal>.
> + that are being subscribed to when the replication starts. This
> + parameter may be either <literal>true</literal>,
> + <literal>false</literal> or <literal>force</literal>. The default is
> + <literal>true</literal>.
> </para>
>
> 2.14.a Maybe options should not be in quotes like that. I think they
> should be <literal> font.

Modified

> 2.14.b The last copy_data should also be <literal> font.

Modified

> ~~~
>
> 2.15 doc/src/sgml/ref/create_subscription.sgml
>
> @@ -374,6 +388,16 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
> can have non-existent publications.
> </para>
>
> + <para>
> + If subscription is created with local_only as 'on' and copy_data as 'on', it
> + will check if the publisher tables are being subscribed to any other
> + publisher and throw an error to prevent inconsistent data in the
> + subscription. User can continue with the copy operation without throwing any
> + error in this case by specifying copy_data as 'force'. Refer to the
> + <xref linkend="bidirectional-logical-replication"/> on how
> + copy_data and local_only can be used in bidirectional replication.
> + </para>
>
> 2.15.a I think all those mentions of copy_data and local_only and
> their values should be in <literal> font instead of in quotes.

Modified

> 2.15.b Wording: "User can" -> "The user can"

Modified

> ~~~
>
> 2.16 src/backend/commands/subscriptioncmds.c - macro
>
> > 20. src/backend/commands/subscriptioncmds.c - IS_COPY_DATA_ON_OR_FORCE
> >
> > @@ -69,6 +69,18 @@
> > /* check if the 'val' has 'bits' set */
> > #define IsSet(val, bits) (((val) & (bits)) == (bits))
> >
> > +#define IS_COPY_DATA_ON_OR_FORCE(copy_data) (copy_data != COPY_DATA_OFF)
> >
> > Maybe this would be better as a static inline function?
>
> Vignesh: What is the advantage of doing this change? I have not changed this
> as the macro usage is fine. Thoughts?
>
> Originally I was going to suggest the macro should use extra parens
> like ((copy_data) != COPY_DATA_OFF), but then I thought if it was a
> function then it would have enum type-checking which would be better.
> If you want to keep the macro then please make the parens change.

Modified to include parenthesis.

> ~~~
>
> 2.17 src/backend/commands/subscriptioncmds.c - DefGetCopyData
>
> + /*
> + * The set of strings accepted here should match up with
> + * the grammar's opt_boolean_or_string production.
> + */
> + if (pg_strcasecmp(sval, "true") == 0 ||
> + pg_strcasecmp(sval, "on") == 0)
> + return COPY_DATA_ON;
> + if (pg_strcasecmp(sval, "false") == 0 ||
> + pg_strcasecmp(sval, "off") == 0)
> + return COPY_DATA_OFF;
> + if (pg_strcasecmp(sval, "force") == 0)
> + return COPY_DATA_FORCE;
>
> I think you can change the order of these to be off/on/force, so then
> the order is consistent with the T_Integer case.

Modified

> ~~~
>
> 2.18 src/backend/commands/subscriptioncmds.c - DefGetCopyData
>
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("%s requires a boolean or \"force\"", def->defname));
> + return COPY_DATA_OFF; /*
> keep compiler quiet */
>
> Excessive comment indent? Has pg_indent been run?

Modified

> ~~~
>
> 2.19 src/backend/commands/subscriptioncmds.c - AlterSubscription
>
> @@ -1236,7 +1306,8 @@ AlterSubscription(ParseState *pstate,
> AlterSubscriptionStmt *stmt,
> errmsg("ALTER SUBSCRIPTION ... REFRESH is not allowed for disabled
> subscriptions")));
>
> parse_subscription_options(pstate, stmt->options,
> - SUBOPT_COPY_DATA, &opts);
> + SUBOPT_COPY_DATA,
> + &opts);
>
> This is a formatting change only. Maybe it does not belong in this
> patch unless it is the result of pg_indent.

Modified

> ~~~
>
> 2.20 src/backend/commands/subscriptioncmds.c - fetch_table_list
>
> > 23. src/backend/commands/subscriptioncmds.c - long errmsg
> >
> > + if (copydata == COPY_DATA_ON && local_only && !slot_attisnull(slot, 3))
> > + ereport(ERROR,
> > + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("CREATE/ALTER SUBSCRIPTION with local_only and copy_data as
> > true is not allowed when the publisher might have replicated data,
> > table:%s.%s might have replicated data in the publisher",
> > + nspname, relname),
> > + errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force"));
> > +
> >
> > The errmsg seems way too long for the source code. Can you use string
> > concatenation or continuation chars to wrap the message over multiple
> > lines?
>
> Vignesh: I had seen that the long error message elsewhere also is in a
> single line. I think we should keep it as it is to maintain the coding
> standard. Thoughts?
>
> OK, if you say it is already common practice then it's fine by me to
> leave it as-is.

Ok, no change done.

> ~~~
>
> 2.21 src/test/regress/expected/subscription.out - make check
>
> make check fails.
> 1 of 214 tests failed.
>
> 2.21.a It looks like maybe you did not update the expected ordering of
> some of the tests, after some minor adjustments in subscriprion.sql in
> v10. So the expected output needs to be fixed in the patch.

Modified

> 2.21.b. Suggest adding this patch to CF so that the cfbot can pick up
> such test problems earlier.

CF entry added

Thanks for the comments, the attached v11 patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v11-0001-Skip-replication-of-non-local-data.patch text/x-patch 53.7 KB
v11-0002-Support-force-option-for-copy_data-check-and-thr.patch text/x-patch 47.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-04-28 17:07:22 Re: Handle infinite recursion in logical replication setup
Previous Message Robert Haas 2022-04-28 16:53:39 Re: Re: fix cost subqueryscan wrong parallel cost