Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-04-05 02:47:04
Message-ID: CAHut+PvwqwzuoQio-hJniarDUOTyxFrwrS9hucd47gEW-9wu-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my comments for the latest patch v6-0002.

PATCH v6-0002 comments
======================

2.1 General - should this be an independent patch?

In many ways, I think most of this patch is unrelated to the other
"local_only" patch (v6-0001).

For example, IIUC even in the current HEAD, we could consider it to be
a user error if multiple SUBSCRIPTIONS or multiple PUBLICATIONS of the
same SUBSCRIPTION are replicating to the same TABLE on the same node
and using "copy_data = on".

So I think it would be ok to throw an ERROR if such a copy_data clash
is detected, and then the user will have to change to use "copy_data =
off" for some/all of them to avoid data duplication.

The "local_only" option only adds some small logic to this new ERROR,
but it's not really a prerequisite at all.

e.g. this whole ERROR part of the patch can be a separate thread.

~~~

2.2 General - can we remove the "force" enum?

Now, because I consider the clashing "copy_data = on" ERROR to be a
user error, I think that is something that the user can already take
care of themselves just using the copy_data = off.

I did not really like the modifying of the "copy_data" option from
just boolean to some kind of hybrid boolean + "force".

a) It smells a bit off to me. IMO replication is supposed to end up
with the same (replicated) data on the standby machine but this
"force" mode seems to be just helping the user to break that concept
and say - "I know what I'm doing, and I don't care if I get lots of
duplicated data in the replica table - just let me do it"...

b) It also somehow feels like the new "force" was introduced mostly to
make the code ERROR handling implementation simpler, rather than to
make the life of the end-user better. Yes, if force is removed maybe
the copy-clash-detection-code will need to be internally quite more
complex than it is now, but that is how it should be, instead of
putting extra burden on the user (e.g. by complicating the PG docs and
giving them yet more alternatives to configure). I think any clashing
copy_data options really is a user error, but also I think the current
boolean copy_data true/false already gives the user a way to fix it.

c) Actually, your new error hint messages are similar to my
perspective: They already say "Use CREATE/ALTER SUBSCRIPTION with
copy_data = off or force". All I am saying is remove the "force", and
the user can still fix the problem just by using "copy_data = off"
appropriately.

======

So (from above) I am not much in favour of the copy_data becoming a
hybrid enum and using "force", yet that is what most of this patch is
implementing. Anyway, the remainder of my review comments here are for
the code in its current form. Maybe if "force" can be removed most of
the following comments end up being redundant.

======

2.3 Commit message - wording

This message is difficult to understand.

I think that the long sentence "Now when user is trying..." can be
broken into more manageable parts.

This part "and throw an error so that user can handle the initial copy
data." also seemed a bit vague.

~~~

2.4 Commit message - more functions

"This patch does couple of things:"

IIUC, there seems a third thing implemented by this patch but not
described by the comment. I think it also adds support for ALTER
SUBSCRIPTION SET PUBLICATION WITH (subscribe_local_only)

~~~

2.5 doc/src/sgml/ref/create_subscription.sgml - wording

@@ -161,6 +161,13 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
the replicated changes that was generated from other nodes. The
default is <literal>false</literal>.
</para>
+ <para>
+ If the tables in the publication were also subscribing to the data in
+ the publisher from other publishers, it will affect the
+ <command>CREATE SUBSCRIPTION</command> based on the value specified
+ for <literal>copy_data</literal> option. Refer to the
+ <xref linkend="sql-createsubscription-notes" /> for details.
+ </para>

Is there is a simpler way to express all that?

SUGGESTION
There is some interation between the option "subscribe_local_only" and
option "copy_data". Refer to the <xref
linkend="sql-createsubscription-notes" /> for details.

~~~

2.6 doc/src/sgml/ref/create_subscription.sgml - whitespace

@@ -213,18 +220,28 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
</varlistentry>

<varlistentry>
- <term><literal>copy_data</literal> (<type>boolean</type>)</term>
+ <term><literal>copy_data</literal> (<type>enum</type>)</term>
<listitem>
<para>
Specifies whether to copy pre-existing data in the publications
- that are being subscribed to when the replication starts.
- The default is <literal>true</literal>.
+ that are being subscribed to when the replication starts. This
+ parameter may be either <literal>true</literal>,
+ <literal>false</literal> or <literal>force</literal>. The default is
+ <literal>true</literal>.

That last line has trailing whitespace.

~~~

2.7 doc/src/sgml/ref/create_subscription.sgml - wording

+ <para>
+ If the tables in the publication were also subscribing to the data in
+ the publisher from other publishers, it will affect the
+ <command>CREATE SUBSCRIPTION</command> based on the value specified
+ for <literal>subscribe_local_only</literal> option. Refer to the
+ <xref linkend="sql-createsubscription-notes" /> for details.
+ </para>

This is similar to review comment #2.5 which I thought could be
written in a simpler way.

~~~

2.8 doc/src/sgml/ref/create_subscription.sgml

@@ -375,6 +392,42 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
can have non-existent publications.
</para>

+ <para>
+ Let's consider an existing Multi master logical replication setup between
+ Node1 and Node2 that is created using the following steps:
+ a) Node1 - Publication publishing employee table.
+ b) Node2 - Subscription subscribing from publication pub1 with
+ <literal>subscribe_local_only</literal>.
+ c) Node2 - Publication publishing employee table.
+ d) Node1 - Subscription subscribing from publication pub2 with
+ <literal>subscribe_local_only</literal>.
+ Now when user is trying to add another node Node3 to the above Multi master
+ logical replication setup, user will have to create one subscription
+ subscribing from Node1 and another subscription subscribing from Node2 in
+ Node3 using <literal>subscribe_local_only</literal> option and
+ <literal>copy_data</literal> as <literal>true</literal>, while
+ the subscription is created, server will identify that Node2 is subscribing
+ from Node1 and Node1 is subscribing from Node2 and throw an error like:
+ <programlisting>
+postgres=# CREATE SUBSCRIPTION mysub CONNECTION 'host=192.168.1.50
port=5432 user=foo dbname=foodb'
+ PUBLICATION mypublication, insert_only with
(subscribe_local_only=on);
+ERROR: CREATE/ALTER SUBSCRIPTION with subscribe_local_only and
copy_data as true is not allowed when the publisher might have
replicated data, table:public.t1 might have replicated data in the
publisher
+HINT: Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force
+ </programlisting>
+ In this scenario user can solve this based on one of the 2 possibilities,
+ a) If there are no data present in Node1 and Node2, then the user can create
+ the subscriptions to Node1 and Node2 with
+ <literal>subscribe_local_only</literal> as <literal>true</literal> and
+ <literal>copy_data</literal> as <literal>false</literal>. b) If the data is
+ present, then the user can create subscription with
+ <literal>copy_data</literal> as <literal>force</literal> on Node1 and
+ <literal>copy_data</literal> as <literal>false</literal> on Node2, before
+ allowing any operations on the respective tables of Node1 and Node2, in this
+ case <literal>copy_data</literal> is <literal>false</literal> on Node2
+ because the data will be replicated to each other and available on both the
+ nodes.
+ </para>
+

That is a large slab of text in the Notes, so not very easy to digest it.

I'm not sure what to suggest for this -
- Perhaps the a,b,c,d should all be "lists" so it renders differently?
- It almost seems like too much information to be included in this
"Notes" section, Maybe it needs its own full page in PG Docs "Logical
Replication" to discuss this topic.

~~~

2.9 src/backend/commands/subscriptioncmds.c - IS_COPY_DATA_VALID

@@ -69,6 +69,18 @@
/* check if the 'val' has 'bits' set */
#define IsSet(val, bits) (((val) & (bits)) == (bits))

+#define IS_COPY_DATA_VALID(copy_data) (copy_data != COPY_DATA_OFF)

The macro seems misnamed because "off" is also "valid". It seems like
it should be called something different like IS_COPY_DATA, or
IS_COPY_DATA_ON_OR_FORCE, etc.

~~~

2.10 src/backend/commands/subscriptioncmds.c - DefGetCopyData

+ /*
+ * The set of strings accepted here should match up with
+ * the grammar's opt_boolean_or_string production.
+ */
+ if (pg_strcasecmp(sval, "true") == 0)
+ return COPY_DATA_ON;
+ if (pg_strcasecmp(sval, "false") == 0)
+ return COPY_DATA_OFF;
+ if (pg_strcasecmp(sval, "on") == 0)
+ return COPY_DATA_ON;
+ if (pg_strcasecmp(sval, "off") == 0)
+ return COPY_DATA_OFF;
+ if (pg_strcasecmp(sval, "force") == 0)
+ return COPY_DATA_FORCE;

Maybe combine the return for true/on and false/off?

~~~

2.11 src/backend/commands/subscriptioncmds.c - fetch_table_list

+ appendStringInfoString(&cmd,
+ "SELECT DISTINCT N.nspname AS schemaname, C.relname AS tablename,
PS.srsubstate as replicated\n"
+ "FROM pg_publication P,\n"
+ "LATERAL pg_get_publication_tables(P.pubname) GPT\n"
+ "LEFT JOIN pg_subscription_rel PS ON (GPT.relid = PS.srrelid),\n"
+ "pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+ "WHERE C.oid = GPT.relid AND P.pubname in (");
+

That blank line is not needed (it was not there previously) because
the next get_publications_str is a continuation of this SQL.

~~~

2.12 src/backend/commands/subscriptioncmds.c - fetch_table_list comment

+ * It is quite possible that subscriber has not yet pulled data to
+ * the tables, but in ideal cases the table data will be subscribed.
+ * Too keep the code simple it is not checked if the subscriber table
+ * has pulled the data or not.
+ */

Typo "Too" -> "To"

~~~

2.13 src/backend/commands/subscriptioncmds.c - force

+ if (copydata == COPY_DATA_ON && only_local && !slot_attisnull(slot, 3))
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("CREATE/ALTER SUBSCRIPTION with subscribe_local_only and
copy_data as true is not allowed when the publisher might have
replicated data, table:%s.%s might have replicated data in the
publisher",
+ nspname, relname),
+ errhint("Use CREATE/ALTER SUBSCRIPTION with copy_data = off or force"));

AFAIK this is the only code of this patch that is distinguishing
between "force" and the "on". As I wrote at the beginning of this post
I feel you can keep this ERROR (maybe the way to detect it needs to be
more complex) and the user can fix it just by using "copy_data = off"
appropriately. So then copy_data does not need to be changed.

~~~

2.14 src/test/regress/sql/subscription.sql - missing tests

The new copy_data is not really an enum true/false/force like the PG
docs claims.

It seems more like some kind of hybrid boolean+force. So if that is
how it is going to be then there are missing test cases to make sure
that values like "on"/"off"/"0"/"1" still are working.

~~~

2.15 src/test/subscription/t/032_circular.pl

@@ -65,6 +69,25 @@ $node_A->safe_psql('postgres', "
PUBLICATION tap_pub_B
WITH (subscribe_local_only = on, copy_data = off)");

+($result, $stdout, $stderr) = $node_A->psql('postgres', "
+ CREATE SUBSCRIPTION tap_sub_A1
+ CONNECTION '$node_B_connstr application_name=$appname_A'
+ PUBLICATION tap_pub_B
+ WITH (subscribe_local_only = on, copy_data = on)");
+like(
+ $stderr,
+ qr/ERROR: CREATE\/ALTER SUBSCRIPTION with
subscribe_local_only and copy_data as true is not allowed when the
publisher might have replicated data/,
+ "Create subscription with subscribe_local_only and copy_data
having replicated table in publisher");
+
+$node_A->safe_psql('postgres', "
+ CREATE SUBSCRIPTION tap_sub_A2
+ CONNECTION '$node_B_connstr application_name=$appname_A'
+ PUBLICATION tap_pub_B
+ WITH (subscribe_local_only = on, copy_data = force)");
+
+$node_A->safe_psql('postgres', "
+ DROP SUBSCRIPTION tap_sub_A2");
+

Maybe underneath it is the same, but from the outside, this looks like
a slightly different scenario from what is mentioned everywhere else
in the patch.

I think it would be better to create a new Node_C (aka Node3) so then
the TAP test can use the same example that you give in the commit
message and the PG docs notes.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2022-04-05 03:04:04 Re: Temporary tables versus wraparound... again
Previous Message Andres Freund 2022-04-05 02:36:34 Re: shared-memory based stats collector - v68