Re: Handle infinite recursion in logical replication setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-04-08 07:30:04
Message-ID: CALDaNm1ei=rRwCBKWtUu8b5OsS6FFcvaxg9h0oXcjgFn8GoZnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 6:21 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are my comments for the latest patch v6-0001.
>
> (I will post my v6-0002 review comments separately)
>
> PATCH v6-0001 comments
> ======================
>
> 1.1 General - Option name
>
> I still feel like the option name is not ideal. Unfortunately, this is
> important because any name change would impact lots of these patch
> files and docs, struct members etc.
>
> It was originally called "local_only", but I thought that as a
> SUBSCRIPTION option that was confusing because "local" means local to
> what? Really it is local to the publisher, not local to the
> subscriber, so that name seemed misleading.
>
> Then I suggested "publish_local_only". Although that resolved the
> ambiguity problem, other people thought it seemed odd to have the
> "publish" prefix for a subscription-side option.
>
> So now it is changed again to "subscribe_local_only" -- It's getting
> better but still, it is implied that the "local" means local to the
> publisher except there is nothing in the option name really to convey
> that meaning. IMO we here all understand the meaning of this option
> mostly by familiarity with this discussion thread, but I think a user
> coming to this for the first time will still be confused by the name.
>
> Below are some other name ideas. Maybe they are not improvements, but
> it might help other people to come up with something better.
>
> subscribe_publocal_only = true/false
> origin = pub_only/any
> adjacent_data_only = true/false
> direct_subscriptions_only = true/false
> ...
>
>
> (FYI, the remainder of these review comments will assume the option is
> still called "subscribe_local_only")

Modified to local_only

> ~~~
>
> 1.2 General - inconsistent members and args
>
> IMO the struct members and args should also be named for close
> consistency with whatever the option name is.
>
> Currently the option is called "subscription_local_only". So I think
> the members/args would be better to be called "local_only" instead of
> "only_local".

Modified

> ~~~
>
> 1.3 Commit message - wrong option name
>
> The commit message refers to the option name as "publish_local_only"
> instead of the option name that is currently implemented.

Modified

> ~~~
>
> 1.4 Commit message - wording
>
> The wording seems a bit off. Below is suggested simpler wording which
> I AFAIK conveys the same information.
>
> BEFORE
> Add an option publish_local_only which will subscribe only to the locally
> generated data in the publisher node. If subscriber is created with this
> option, publisher will skip publishing the data that was subscribed
> from other nodes. It can be created using following syntax:
> ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=9999'
> PUBLICATION pub1 with (publish_local_only = on);
>
> SUGGESTION
> This patch adds a new SUBSCRIPTION boolean option
> "subscribe_local_only". The default is false. When a SUBSCRIPTION is
> created with this option enabled, the publisher will only publish data
> that originated at the publisher node.
> Usage:
> CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=9999'
> PUBLICATION pub1 with (subscribe_local_only = true);

Modified

> ~~~
>
> 1.5 doc/src/sgml/ref/create_subscription.sgml - "generated" changes.
>
> + <para>
> + Specifies whether the subscription will request the publisher to send
> + locally generated changes or both the locally generated changes and
> + the replicated changes that was generated from other nodes. The
> + default is <literal>false</literal>.
> + </para>
>
> For some reason, it seemed a bit strange to me to use the term
> "generated" changes. Maybe better to refer to the origin of changes?
>
> SUGGESTION
> Specifies whether the publisher should send only changes that
> originated locally at the publisher node, or send any publisher node
> changes regardless of their origin. The default is false.

Modified

> ~~~
>
> 1.6 src/backend/replication/pgoutput/pgoutput.c -
> LOGICALREP_PROTO_TWOPHASE_VERSION_NUM
>
> @@ -496,6 +509,12 @@ pgoutput_startup(LogicalDecodingContext *ctx,
> OutputPluginOptions *opt,
> else
> ctx->twophase_opt_given = true;
>
> + if (data->only_local && data->protocol_version <
> LOGICALREP_PROTO_TWOPHASE_VERSION_NUM)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> + errmsg("requested proto_version=%d does not support
> subscribe_local_only, need %d or higher",
> + data->protocol_version, LOGICALREP_PROTO_TWOPHASE_VERSION_NUM)));
>
> I thought this code should not be using
> LOGICALREP_PROTO_TWOPHASE_VERSION_NUM. Shouldn't there be some newly
> introduced constant like LOGICALREP_PROTO_LOCALONLY_VERSION_NUM which
> you will use here?

Modified, I have set LOGICALREP_PROTO_LOCALONLY_VERSION_NUM to same
value as LOGICALREP_PROTO_TWOPHASE_VERSION_NUM, will increment this
once server version is changed.

> ~~~
>
> 1.7 src/bin/pg_dump/pg_dump.c - 150000
>
> @@ -4451,11 +4452,13 @@ getSubscriptions(Archive *fout)
> if (fout->remoteVersion >= 150000)
> appendPQExpBufferStr(query,
> " s.subtwophasestate,\n"
> - " s.subdisableonerr\n");
> + " s.subdisableonerr,\n"
> + " s.sublocal\n");
> else
> appendPQExpBuffer(query,
> " '%c' AS subtwophasestate,\n"
> - " false AS subdisableonerr\n",
> + " false AS subdisableonerr,\n"
> + " false AS s.sublocal\n",
> LOGICALREP_TWOPHASE_STATE_DISABLED);
>
> I think this local_only feature is unlikely to get into the PG15
> release, so this code should be split out into a separate condition
> because later will need to change to say >= 160000.

I have split the condition and checked it with 150000, this will be
changed later to 160000 after the branch is created.

> ~~~
>
> 1.8 src/bin/pg_dump/pg_dump.c - dumpSubscription
>
> @@ -4585,6 +4591,9 @@ dumpSubscription(Archive *fout, const
> SubscriptionInfo *subinfo)
> if (strcmp(subinfo->subdisableonerr, "t") == 0)
> appendPQExpBufferStr(query, ", disable_on_error = true");
>
> + if (strcmp(subinfo->sublocal, "f") != 0)
> + appendPQExpBufferStr(query, ", subscribe_local_only = on");
> +
>
> I felt it is more natural to say "if it is true set to true", instead
> of "if it is not false set to on".
>
> SUGGESTION
> if (strcmp(subinfo->sublocal, "t") == 0)
> appendPQExpBufferStr(query, ", subscribe_local_only = true");

Modified

> ~~~
>
> 1.9 src/bin/psql/describe.c - 150000
>
> @@ -6318,9 +6318,11 @@ describeSubscriptions(const char *pattern, bool verbose)
> if (pset.sversion >= 150000)
> appendPQExpBuffer(&buf,
> ", subtwophasestate AS \"%s\"\n"
> - ", subdisableonerr AS \"%s\"\n",
> + ", subdisableonerr AS \"%s\"\n"
> + ", sublocal AS \"%s\"\n",
> gettext_noop("Two phase commit"),
> - gettext_noop("Disable on error"));
> + gettext_noop("Disable on error"),
> + gettext_noop("Only local"));
>
> I think this local_only feature is unlikely to get into the PG15
> release, so this code should be split out into a separate condition
> because later will need to change to say >= 160000.

I have split the condition and checked it with 150000, this will be
changed later to 160000 after the branch is created.

> ~~
>
> 1.10 src/bin/psql/describe.c - describeSubscriptions column
>
> @@ -6318,9 +6318,11 @@ describeSubscriptions(const char *pattern, bool verbose)
> if (pset.sversion >= 150000)
> appendPQExpBuffer(&buf,
> ", subtwophasestate AS \"%s\"\n"
> - ", subdisableonerr AS \"%s\"\n",
> + ", subdisableonerr AS \"%s\"\n"
> + ", sublocal AS \"%s\"\n",
> gettext_noop("Two phase commit"),
> - gettext_noop("Disable on error"));
> + gettext_noop("Disable on error"),
> + gettext_noop("Only local"));
>
> I think the column name here should be more consistent with the option
> name. e.g. it should be "Local only", not "Only local".

Modified

> ~~~
>
> 1.11 src/bin/psql/tab-complete.c - whitespace
>
> @@ -3167,7 +3167,7 @@ psql_completion(const char *text, int start, int end)
> /* Complete "CREATE SUBSCRIPTION <name> ... WITH ( <opt>" */
> else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
> COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
> - "enabled", "slot_name", "streaming",
> + "enabled", "slot_name", "streaming", "subscribe_local_only",
>
> The patch accidentally added a space char before the "slot_name".

Modified

> ~~~
>
> 1.12 src/include/replication/walreceiver.h - "generated"
>
> @@ -183,6 +183,7 @@ typedef struct
> bool streaming; /* Streaming of large transactions */
> bool twophase; /* Streaming of two-phase transactions at
> * prepare time */
> + bool only_local; /* publish only locally generated data */
>
> This is a similar review comment as #1.5 about saying the word "generated".
> Maybe there is another way to word this?

Modified

> ~~~
>
> 1.13 src/test/regress/sql/subscription.sql - missing test case
>
> Isn't there a missing test case for ensuring that the new option is boolean?

Added test

Thanks for the comments, the attached v7 patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v7-0001-Skip-replication-of-non-local-data.patch text/x-patch 51.6 KB
v7-0002-Support-force-option-for-copy_data-check-and-thro.patch text/x-patch 43.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2022-04-08 07:40:05 Re: Handle infinite recursion in logical replication setup
Previous Message Jeff Davis 2022-04-08 07:27:44 pgsql: Add contrib/pg_walinspect.