Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-08-31 06:05:01
Message-ID: CAHut+PvonTd423-cWqoxh0w8Bd_Po3OToqqyxuR1iMNmxSLr_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my review comments for v43-0001.

======

1. Commit message

The initial copy phase has no way to know the origin of the row data,
so if 'copy_data = true' in the step 4 below, log a warning to notify user
that potentially non-local data might have been copied.

1a.
"in the step 4" -> "in step 4"

~

1b.
"notify user" -> "notify the user"

======

2. doc/src/sgml/ref/create_subscription.sgml

+ <para>
+ If the subscription is created with <literal>origin = none</literal> and
+ <literal>copy_data = true</literal>, it will check if the publisher has
+ subscribed to the same table from other publishers and, if so, log a
+ warning to notify the user to check the publisher tables. Before continuing
+ with other operations the user should check that publisher tables did not
+ have data with different origins to prevent data inconsistency issues on the
+ subscriber.
+ </para>

I am not sure about that wording saying "to prevent data inconsistency
issues" because I thought maybe is already too late because any
unwanted origin data is already copied during the initial sync. I
think the user can do it try to clean up any problems caused before
things become even worse (??)

~~~

You asked for my thoughts (see [1] 6b):

> There is nothing much a user can do in this case. Only option would be
> to take a backup before the operation and restore it and then recreate
> the replication setup. I was not sure if we should document these
> steps as similar inconsistent data could get created without the
> origin option . Thoughts?

I think those cases are a bit different:

- For a normal scenario (i.e. origin=ANY) then inconsistent data just
refers to problems like maybe PK violations so I think you just get
what you get and have to deal with the errors...

- But when the user said origin=NONE they are specifically saying they
do NOT want any non-local data, yet they still might end up getting
data contrary to their wishes. So I think maybe there ought to be
something documented to warn about this potentially happening. My
first impression is that all this seems like a kind of terrible
problem because if it happens then cleanup could be difficult - if
that subscriber in turn was also a publisher then this bad data might
have propagated all over the place already! Anyway, I guess all this
could be mentioned in some (new ?) section of the
logical-replication.sgml file (i.e. patch 0002).

IDEA: I was wondering if the documentation should recommend (or maybe
we can even enforce this) that when using origin=NONE the user should
always create the subscription with enabled=FALSE. That way if there
are some warnings about potential bad origin data then there it might
be possible to take some action or change their mind *before* any
unwanted data pollutes everything.

======

3. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

+ if (count == 0)
+ tablenames = makeStringInfo();
+ else
+ appendStringInfoString(tablenames, ", ");
+
+ appendStringInfo(tablenames, "%s.%s", nspname, relname);
+ count++;

I think the quotes are not correct - each separate table name should be quoted.

~~~

4.

+ /*
+ * There might be many tables present in this case, we will display a
+ * maximum of 100 tables followed by "..." to indicate there might be
+ * more tables.
+ */
+ if (count == 100)
+ {
+ appendStringInfoString(tablenames, ", ...");
+ break;
+ }

4a.
IMO this code should be part of the previous if/else

e.g.
if (count == 0) ... makeStringInfo()
else if (count > 100) ... append ellipsis "..." and break out of the loop
else ... append table name to the message

Doing it in this way means "..." definitely means there are more than
100 bad tables.

~

4b.
Changing like above (#4a) simplifies the comment because it removes
doubts like "might be…" since you already know you found the 101st
table.

SUGGESTION (comment)
The message displays a maximum of 100 tables having potentially
non-local data. Any more than that will be indicated by "...".

~

4c.
It still seems a bit disconcerting to me that there can be cases where
bad data was possibly copied but there is no record of what tables
have been polluted. Perhaps when the maximum bad tables are exceeded
then there can be some additional message to tell users what SQL they
can use to discover what all the other bad tables were.

~~~

5.

+ * XXX: For simplicity, we don't check whether the table has any data
+ * or not. If the table doesn't have any data then we don't need to
+ * distinguish between data having origin and data not having origin so
+ * we can avoid throwing a warning in that case.

"throwing a warning" -> "logging a warning"

~~~

6.

+ errdetail("Subscribed table(s) \"%s\" has been subscribed from some
other publisher.",
+ tablenames->data),

I don't think the table name list should be quoted in the errdetail,
because each of the individual table names should have been quoted.
See previous review comment #3.

======

7. src/test/subscription/t/030_origin.pl

+###############################################################################
+# Specify origin as 'none' which indicates that the publisher should only
+# replicate the changes that are generated locally from node_B, but in
+# this case since the node_B is also subscribing data from node_A, node_B can
+# have remotely originated data from node_A. We log a warning, in this case,
+# to draw attention to there being possible remote data.
+###############################################################################

This comment wording makes it seem like this is using different
subscription parameters from the other tests that preceded it, but in
fact the prior tests also been using origin=none. (??)

Maybe it just needs a trivial rewording.

SUGGESTION
Specifying origin=NONE indicates that...

~~~

8.

+like(
+ $stderr,
+ qr/WARNING: ( [A-Z0-9]+:)? subscription tap_sub_a_b_2 requested
origin=NONE but might copy data that had a different origin/,
+ "Create subscription with origin = none and copy_data when the
publisher has subscribed same table"
+);

Is it possible for these checks of the log file to get the errdetail
message too? Then the test will be more readable for the expected
result and you will also have the bonus of testing the table-name list
in the warning more thoroughly.

------
[1] https://www.postgresql.org/message-id/CALDaNm1V%2BAvzPsUcq%3DmNYG%2BJfAyovTWBe4vWKTknOgH_ko1E0Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-08-31 06:09:16 Re: [PATCH] Add native windows on arm64 support
Previous Message Kyotaro Horiguchi 2022-08-31 05:36:38 Re: pg_rewind WAL segments deletion pitfall