Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-04-05 00:51:20
Message-ID: CAHut+PvsvrZiPPZK7=_4c+Lc+MCe8peqDKMwbH+_mp1wj0-yOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my comments for the latest patch v6-0001.

(I will post my v6-0002 review comments separately)

PATCH v6-0001 comments
======================

1.1 General - Option name

I still feel like the option name is not ideal. Unfortunately, this is
important because any name change would impact lots of these patch
files and docs, struct members etc.

It was originally called "local_only", but I thought that as a
SUBSCRIPTION option that was confusing because "local" means local to
what? Really it is local to the publisher, not local to the
subscriber, so that name seemed misleading.

Then I suggested "publish_local_only". Although that resolved the
ambiguity problem, other people thought it seemed odd to have the
"publish" prefix for a subscription-side option.

So now it is changed again to "subscribe_local_only" -- It's getting
better but still, it is implied that the "local" means local to the
publisher except there is nothing in the option name really to convey
that meaning. IMO we here all understand the meaning of this option
mostly by familiarity with this discussion thread, but I think a user
coming to this for the first time will still be confused by the name.

Below are some other name ideas. Maybe they are not improvements, but
it might help other people to come up with something better.

subscribe_publocal_only = true/false
origin = pub_only/any
adjacent_data_only = true/false
direct_subscriptions_only = true/false
...

(FYI, the remainder of these review comments will assume the option is
still called "subscribe_local_only")

~~~

1.2 General - inconsistent members and args

IMO the struct members and args should also be named for close
consistency with whatever the option name is.

Currently the option is called "subscription_local_only". So I think
the members/args would be better to be called "local_only" instead of
"only_local".

~~~

1.3 Commit message - wrong option name

The commit message refers to the option name as "publish_local_only"
instead of the option name that is currently implemented.

~~~

1.4 Commit message - wording

The wording seems a bit off. Below is suggested simpler wording which
I AFAIK conveys the same information.

BEFORE
Add an option publish_local_only which will subscribe only to the locally
generated data in the publisher node. If subscriber is created with this
option, publisher will skip publishing the data that was subscribed
from other nodes. It can be created using following syntax:
ex: CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=9999'
PUBLICATION pub1 with (publish_local_only = on);

SUGGESTION
This patch adds a new SUBSCRIPTION boolean option
"subscribe_local_only". The default is false. When a SUBSCRIPTION is
created with this option enabled, the publisher will only publish data
that originated at the publisher node.
Usage:
CREATE SUBSCRIPTION sub1 CONNECTION 'dbname =postgres port=9999'
PUBLICATION pub1 with (subscribe_local_only = true);

~~~

1.5 doc/src/sgml/ref/create_subscription.sgml - "generated" changes.

+ <para>
+ Specifies whether the subscription will request the publisher to send
+ locally generated changes or both the locally generated changes and
+ the replicated changes that was generated from other nodes. The
+ default is <literal>false</literal>.
+ </para>

For some reason, it seemed a bit strange to me to use the term
"generated" changes. Maybe better to refer to the origin of changes?

SUGGESTION
Specifies whether the publisher should send only changes that
originated locally at the publisher node, or send any publisher node
changes regardless of their origin. The default is false.

~~~

1.6 src/backend/replication/pgoutput/pgoutput.c -
LOGICALREP_PROTO_TWOPHASE_VERSION_NUM

@@ -496,6 +509,12 @@ pgoutput_startup(LogicalDecodingContext *ctx,
OutputPluginOptions *opt,
else
ctx->twophase_opt_given = true;

+ if (data->only_local && data->protocol_version <
LOGICALREP_PROTO_TWOPHASE_VERSION_NUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("requested proto_version=%d does not support
subscribe_local_only, need %d or higher",
+ data->protocol_version, LOGICALREP_PROTO_TWOPHASE_VERSION_NUM)));

I thought this code should not be using
LOGICALREP_PROTO_TWOPHASE_VERSION_NUM. Shouldn't there be some newly
introduced constant like LOGICALREP_PROTO_LOCALONLY_VERSION_NUM which
you will use here?

~~~

1.7 src/bin/pg_dump/pg_dump.c - 150000

@@ -4451,11 +4452,13 @@ getSubscriptions(Archive *fout)
if (fout->remoteVersion >= 150000)
appendPQExpBufferStr(query,
" s.subtwophasestate,\n"
- " s.subdisableonerr\n");
+ " s.subdisableonerr,\n"
+ " s.sublocal\n");
else
appendPQExpBuffer(query,
" '%c' AS subtwophasestate,\n"
- " false AS subdisableonerr\n",
+ " false AS subdisableonerr,\n"
+ " false AS s.sublocal\n",
LOGICALREP_TWOPHASE_STATE_DISABLED);

I think this local_only feature is unlikely to get into the PG15
release, so this code should be split out into a separate condition
because later will need to change to say >= 160000.

~~~

1.8 src/bin/pg_dump/pg_dump.c - dumpSubscription

@@ -4585,6 +4591,9 @@ dumpSubscription(Archive *fout, const
SubscriptionInfo *subinfo)
if (strcmp(subinfo->subdisableonerr, "t") == 0)
appendPQExpBufferStr(query, ", disable_on_error = true");

+ if (strcmp(subinfo->sublocal, "f") != 0)
+ appendPQExpBufferStr(query, ", subscribe_local_only = on");
+

I felt it is more natural to say "if it is true set to true", instead
of "if it is not false set to on".

SUGGESTION
if (strcmp(subinfo->sublocal, "t") == 0)
appendPQExpBufferStr(query, ", subscribe_local_only = true");

~~~

1.9 src/bin/psql/describe.c - 150000

@@ -6318,9 +6318,11 @@ describeSubscriptions(const char *pattern, bool verbose)
if (pset.sversion >= 150000)
appendPQExpBuffer(&buf,
", subtwophasestate AS \"%s\"\n"
- ", subdisableonerr AS \"%s\"\n",
+ ", subdisableonerr AS \"%s\"\n"
+ ", sublocal AS \"%s\"\n",
gettext_noop("Two phase commit"),
- gettext_noop("Disable on error"));
+ gettext_noop("Disable on error"),
+ gettext_noop("Only local"));

I think this local_only feature is unlikely to get into the PG15
release, so this code should be split out into a separate condition
because later will need to change to say >= 160000.

~~~

1.10 src/bin/psql/describe.c - describeSubscriptions column

@@ -6318,9 +6318,11 @@ describeSubscriptions(const char *pattern, bool verbose)
if (pset.sversion >= 150000)
appendPQExpBuffer(&buf,
", subtwophasestate AS \"%s\"\n"
- ", subdisableonerr AS \"%s\"\n",
+ ", subdisableonerr AS \"%s\"\n"
+ ", sublocal AS \"%s\"\n",
gettext_noop("Two phase commit"),
- gettext_noop("Disable on error"));
+ gettext_noop("Disable on error"),
+ gettext_noop("Only local"));

I think the column name here should be more consistent with the option
name. e.g. it should be "Local only", not "Only local".

~~~

1.11 src/bin/psql/tab-complete.c - whitespace

@@ -3167,7 +3167,7 @@ psql_completion(const char *text, int start, int end)
/* Complete "CREATE SUBSCRIPTION <name> ... WITH ( <opt>" */
else if (HeadMatches("CREATE", "SUBSCRIPTION") && TailMatches("WITH", "("))
COMPLETE_WITH("binary", "connect", "copy_data", "create_slot",
- "enabled", "slot_name", "streaming",
+ "enabled", "slot_name", "streaming", "subscribe_local_only",

The patch accidentally added a space char before the "slot_name".

~~~

1.12 src/include/replication/walreceiver.h - "generated"

@@ -183,6 +183,7 @@ typedef struct
bool streaming; /* Streaming of large transactions */
bool twophase; /* Streaming of two-phase transactions at
* prepare time */
+ bool only_local; /* publish only locally generated data */

This is a similar review comment as #1.5 about saying the word "generated".
Maybe there is another way to word this?

~~~

1.13 src/test/regress/sql/subscription.sql - missing test case

Isn't there a missing test case for ensuring that the new option is boolean?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2022-04-05 01:13:06 Re: Skipping logical replication transactions on subscriber side
Previous Message Andres Freund 2022-04-05 00:39:58 Re: Run pg_amcheck in 002_pg_upgrade.pl and 027_stream_regress.pl?