Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-06-10 08:39:04
Message-ID: CAHut+PtDx8Z0_fjzXTrLT1_6GZn7iB85XAwHCpbTwZfp_DKfvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Below are some review comments for the patch v18-0004

======

1. Commit message

Document the steps for the following:
a) Create a two-node bidirectional replication when there is no data in both the
nodes.
b) Adding a new node when there is no data in any of the nodes.
c) Adding a new node when data is present in the existing nodes.
d) Generic steps to add a new node to the existing set of nodes.

SUGGESTION (minor changes to make the tense consistent)
Documents the steps for the following:
a) Creating a two-node bidirectional replication when there is no data
in both nodes.
b) Adding a new node when there is no data in any of the nodes.
c) Adding a new node when data is present in the existing nodes.
d) Generic steps for adding a new node to an existing set of nodes.

======

2. doc/src/sgml/logical-replication.sgml - blurb

+ <para>
+ Bidirectional replication is useful in creating a multi-master database
+ which helps in performing read/write operations from any of the nodes.

SUGGESTION
Bidirectional replication is useful for creating a multi-master
database environment for replicating read/write operations performed
by any of the member nodes.

~~~

3. doc/src/sgml/logical-replication.sgml - warning

+ <warning>
+ <para>
+ Setting up bidirectional logical replication across nodes
requires multiple
+ steps to be performed on various nodes. Because not all operations are
+ transactional, the user is advised to take backups.
+ </para>
+ </warning>

SUGGESTION (but keep your formatting)
Setting up bidirectional logical replication requires multiple steps
to be performed on various nodes. Because...

~~~

4. doc/src/sgml/logical-replication.sgml - Setting bidirectional
replication between two nodes

+ <para>
+ The steps to create a two-node bidirectional replication when there is no
+ data in both the nodes <literal>node1</literal> and
+ <literal>node2</literal> are given below:
+ </para>

SUGGESTION (but keep your formatting)
The following steps demonstrate how to create a two-node bidirectional
replication when there is no table data present in both nodes node1
and node2:

~~~

5. doc/src/sgml/logical-replication.sgml - Setting bidirectional
replication between two nodes

+ <para>
+ Lock the required tables of <literal>node1</literal> and
+ <literal>node2</literal> in <literal>EXCLUSIVE</literal> mode until the
+ setup is completed.
+ </para>

Instead of "the required tables" shouldn't this just say "table t1"?

~~~

6. doc/src/sgml/logical-replication.sgml - Setting bidirectional
replication between two nodes

+ <para>
+ Now the bidirectional logical replication setup is complete between
+ <literal>node1</literal>, <literal>node2</literal> and
+ <literal>node2</literal>. Any subsequent changes in one node will
+ replicate the changes to the other nodes.
+ </para>

SUGGESTION (for 2nd sentence, and keep your formatting)
Any incremental changes from node1 will be replicated to node2, and
any incremental changes from node2 will be replicated to node1.

~~~

7. doc/src/sgml/logical-replication.sgml - Adding a new node when
there is no data in any of the nodes

+ <para>
+ Adding a new node <literal>node3</literal> to the existing
+ <literal>node1</literal> and <literal>node2</literal> when there is no data
+ in any of the nodes requires setting up subscription in
+ <literal>node1</literal> and <literal>node2</literal> to replicate the data
+ from <literal>node3</literal> and setting up subscription in
+ <literal>node3</literal> to replicate data from <literal>node1</literal>
+ and <literal>node2</literal>.
+ </para>

SUGGESTION (but keep your formatting)
The following steps demonstrate adding a new node node3 to the
existing node1 and node2 when there is no t1 data in any of the nodes.
This requires creating subscriptions in node1 and node2 to replicate
the data from node3 and creating subscriptions in node3 to replicate
data from node1 and node2.

~~~

8. doc/src/sgml/logical-replication.sgml - Adding a new node when
there is no data in any of the nodes

+ <note>
+ <para>
+ It is assumed that the bidirectional logical replication between
+ <literal>node1</literal> and <literal>node2</literal> is already
completed.
+ </para>
+ </note>

IMO this note should just be a text note in the previous paragraph
instead of an SGML <note>. e.g. "Note: These steps assume that..."

~~~

9. doc/src/sgml/logical-replication.sgml - Adding a new node when
there is no data in any of the nodes

+ <para>
+ Lock the required tables of all the nodes <literal>node1</literal>,
+ <literal>node2</literal> and <literal>node3</literal> in
+ <literal>EXCLUSIVE</literal> mode until the setup is completed.
+ </para>

Instead of "the required tables" shouldn't this just say "table t1"?

~~~

10. doc/src/sgml/logical-replication.sgml - Adding a new node when
there is no data in any of the nodes

+ <para>
+ Now the bidirectional logical replication setup is complete between
+ <literal>node1</literal>, <literal>node2</literal> and
+ <literal>node2</literal>. Any subsequent changes in one node will
+ replicate the changes to the other nodes.
+ </para>

SUGGESTION (2nd sentence)
Incremental changes made in any node will be replicated to the other two nodes.

~~~

11. doc/src/sgml/logical-replication.sgml - Adding a new node when
data is present in the existing nodes

+ <para>
+ Adding a new node <literal>node3</literal> which has no data to the
+ existing <literal>node1</literal> and <literal>node2</literal> when data
+ is present in existing nodes <literal>node1</literal> and
+ <literal>node2</literal> needs similar steps. The only change required
+ here is that <literal>node3</literal> should create a subscription with
+ <literal>copy_data = force</literal> to one of the existing nodes to
+ receive the existing data during initial data synchronization.
+ </para>

SUGGESTION (but keep your formatting)
The following steps demonstrate adding a new node node3 which has no
t1 data to the existing node1 and node2 where t1 data is present. This
needs similar steps; the only change required here is that node3
should create a subscription with copy_data = force to one of the
existing nodes so it can receive the existing t1 data during initial
data synchronization.

~~~

12 doc/src/sgml/logical-replication.sgml - Adding a new node when data
is present in the existing nodes

+ <note>
+ <para>
+ It is assumed that the bidirectional logical replication between
+ <literal>node1</literal> and <literal>node2</literal> is already
completed.
+ The nodes <literal>node1</literal> and <literal>node2</literal> has some
+ pre-existing data in table t1 that is synchronized in both the nodes.
+ </para>
+ </note>

12a.
IMO this note should just be text in the previous paragraph instead of
an SGML <note>. e.g. "Note: These steps assume that..."

12b.
SUGGESTION (minor rewording; keep your formatting)
Note: These steps assume that the bidirectional logical replication
between node1 and node2 is already completed, and the pre-existing
data in table t1 is already synchronized in both those nodes.

~~~

13. doc/src/sgml/logical-replication.sgml - Adding a new node when
data is present in the existing nodes

+ <para>
+ Lock the required tables of <literal>node2</literal> and
+ <literal>node3</literal> in <literal>EXCLUSIVE</literal> mode until the
+ setup is completed. No need to lock the tables in <literal>node1</literal>
+ as any data changes made will be synchronized while creating the
+ subscription with <literal>copy_data</literal> specified as
+ <literal>force</literal>.
+ </para>

13a.
Instead of "the required tables" shouldn't this just say "table t1"?

13b.
SUGGESTION (2nd sentence; keep your formatting)
There is no need to lock table t1 in node1 because any data changes
made will be synchronized while creating the subscription with
copy_data = force.

~~~

14. doc/src/sgml/logical-replication.sgml - Adding a new node when
data is present in the existing nodes

+ <para>
+ Create a subscription in <literal>node3</literal> to subscribe to
+ <literal>node1</literal>. Use <literal>copy_data</literal> specified as
+ <literal>force</literal> so that the existing table data is
+ copied during initial sync:

SUGGESTION (2nd sentence; keep your formatting)
Use copy_data = force so that the existing table data is copied during
the initial sync:

~~~

15. doc/src/sgml/logical-replication.sgml - Adding a new node when
data is present in the existing nodes

+ <para>
+ Create a subscription in <literal>node3</literal> to subscribe to
+ <literal>node2</literal>. Use <literal>copy_data</literal> specified as
+ <literal>off</literal> because the initial table data would have been
+ already copied in the previous step:

SUGGESTION (2nd sentence; keep your formatting)
Use copy_data = off because the initial table data would have been
already copied in the previous step:

~~~

16. doc/src/sgml/logical-replication.sgml - Adding a new node when
data is present in the existing nodes

+ <para>
+ Now the bidirectional logical replication setup is complete between
+ <literal>node1</literal>, <literal>node2</literal> and
+ <literal>node2</literal>. Any subsequent changes in one node will
+ replicate the changes to the other nodes.
+ </para>

16a.
Should say node1, node2 and node3

16b.
SUGGESTION (2nd sentence – same as the previous comment)
Incremental changes made in any node will be replicated to the other two nodes.

~~~

17. doc/src/sgml/logical-replication.sgml - Adding a new node when
data is present in the new node

+ <warning>
+ <para>
+ Adding a new node <literal>node3</literal> to the existing
+ <literal>node1</literal> and <literal>node2</literal> when data is present
+ in the new node <literal>node3</literal> is not possible.
+ </para>
+ </warning>

17a.
IMO
- Not really necessary to name the nodes, because this is a generic statement
- Maybe say "not supported" instead of "not possible". e.g. there is
no ERROR check for this case is there?

SUGGESTION
Adding a new node when data is present in the new node tables is not supported.

17b.
I am not sure but I felt this advice seemed more like an SGML <note>;
note a <warning>

~~~

18. doc/src/sgml/logical-replication.sgml - Generic steps to add a new
node to the existing set of nodes

+ <title>Generic steps to add a new node to the existing set of nodes</title>

SUGGESTION
Generic steps for adding a new node to an existing set of nodes

~~~

19. doc/src/sgml/logical-replication.sgml - Generic steps to add a new
node to the existing set of nodes

+ <para>
+ 1. Create the required publication on the new node.
+ </para>
+ <para>
+ 2. Lock the required tables of the new node in <literal>EXCLUSIVE</literal>
+ mode until the setup is complete. This is required to prevent any
+ modifications from happening in the new node. If data modifications occur
+ after step-3, there is a chance that the modifications will be published to
+ the first node and then synchronized back to the new node while creating
+ subscription in step-5 resulting in inconsistent data.
+ </para>
+ <para>
+ 3. Create subscriptions on existing nodes pointing to publication on
+ the new node with <literal>origin</literal> parameter specified as
+ <literal>local</literal> and <literal>copy_data</literal> specified as
+ <literal>off</literal>.
+ </para>
+ <para>
+ 4. Lock the required tables of the existing nodes except the first node in
+ <literal>EXCLUSIVE</literal> mode until the setup is complete. This is
+ required to prevent any modifications from happening. If data modifications
+ occur, there is a chance that the modifications done in between step-5 and
+ step-6 will not be synchronized to the new node resulting in inconsistent
+ data. No need to lock the tables in the first node as any data changes
+ made will be synchronized while creating the subscription with
+ <literal>copy_data</literal> specified as <literal>force</literal>.
+ </para>
+ <para>
+ 5. Create a subscription on the new node pointing to publication on the
+ first node with <literal>origin</literal> parameter specified as
+ <literal>local</literal> and <literal>copy_data</literal> parameter
+ specified as <literal>force</literal>.
+ </para>
+ <para>
+ 6. Create subscriptions on the new node pointing to publications on the
+ remaining nodes with <literal>origin</literal> parameter specified as
+ <literal>local</literal> and <literal>copy_data</literal> parameter
+ specifiedas <literal>off</literal>.
+ </para>

Following suggestions make the following changes:
- Some minor rewording
- Change the step names
- Use "=" instead of "specified as" for the parameters
- I felt it is more readable if the explanatory notes are separate
paragraphs from the steps. Maybe if you could indent them or something
it would be even better.
- Added a couple more explanatory notes

SUGGESTION (using those above changes; please keep your formatting)

Step-1: Create a publication on the new node.

Step-2: Lock the required tables of the new node in EXCLUSIVE mode
until the setup is complete.

(This lock is necessary to prevent any modifications from happening in
the new node because if data modifications occurred after Step-3,
there is a chance that the modifications will be published to the
first node and then synchronized back to the new node while creating
the subscription in Step-5. This would result in inconsistent data).

Step-3. Create subscriptions on existing nodes to the publication on
the new node with origin = local and copy_data = off.

(The copy_data = off is OK here because it is asserted that the
published tables of the new node will have no pre-existing data).

Step-4. Lock the required tables of the existing nodes except the
first node in EXCLUSIVE mode until the setup is complete.

(This lock is necessary to prevent any modifications from happening.
If data modifications occur, there is a chance that modifications done
between Step-5 and Step-6 will not be synchronized to the new node.
This would result in inconsistent data. There is no need to lock table
t1 in node1 because any data changes made will be synchronized while
creating the subscription with copy_data = force).

Step-5. Create a subscription on the new node to the publication on
the first node with origin = local and copy_data = force.

(This will copy the same table data from the existing nodes to the new node)

Step-6. Create subscriptions on the new node to publications on the
remaining nodes with origin = local and copy_data = off.

(The copy_data = off is OK here because the existing node data was
already copied to the new node in Step-5)

======

20. doc/src/sgml/ref/create_subscription.sgml

- <literal>copy_data = force</literal>.
+ <literal>copy_data = force</literal>. Refer to the
+ <xref linkend="logical-replication-bidirectional"/> on how
+ <literal>copy_data</literal> and <literal>origin</literal> can be used
+ in bidirectional replication.
</para>

SUGGESTION (keep your formatting)
Refer to <xref linkend="logical-replication-bidirectional"/> for how
<literal>copy_data</literal> and <literal>origin</literal> can be used
in bidirectional replication.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shiy.fnst@fujitsu.com 2022-06-10 08:45:04 RE: Replica Identity check of partition table on subscriber
Previous Message Andrey Borodin 2022-06-10 08:21:43 Re: better page-level checksums