Re: Handle infinite recursion in logical replication setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-03-02 10:28:55
Message-ID: CAA4eK1+co2cd8a6okgUD_pcFEHcc7mVc0k_RE2=6ahyv3WPRMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 23, 2022 at 11:59 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
...
...
> I have attached a basic patch for this, if the idea is accepted, I
> will work further to test more scenarios, add documentation, and test
> and post an updated patch.
> For the second problem, Table synchronization of table including local
> data and replicated data using copy command.
>
> Let us consider the following scenario:
> a) node1 publishing to node2 b) node2 publishing to node1. Here in
> this case node1 will have replicated data from node2 and vice versa.
>
> In the above if user wants to include node3 to subscribe data from
> node2. Users will have to create a subscription in node3 to get the
> data from node2. During table synchronization we send the complete
> table data from node2 to node3. Node2 will have local data from node2
> and also replicated data from node1. Currently we don't have an option
> to differentiate between the locally generated data and replicated
> data in the heap which will cause infinite recursion as described
> above.
>
> To handle this if user has specified only_local option, we could throw
> a warning or error out while creating subscription in this case, we
> could have a column srreplicateddata in pg_subscription_rel which
> could indicate if the table has any replicated data or not:
> postgres=# select * from pg_subscription_rel;
> srsubid | srrelid | srsubstate | srsublsn | srreplicateddata
> ---------+---------+------------+-----------+------------------
> 16389 | 16384 | r | 0/14A4640 | t
> 16389 | 16385 | r | 0/14A4690 | f
> (1 row)
>
> In the above example, srreplicateddata with true indicates, tabel t1
> whose relid is 16384 has replicated data and the other row having
> srreplicateddata as false indicates table t2 whose relid is 16385
> does not have replicated data.
> When creating a new subscription, the subscriber will connect to the
> publisher and check if the relation has replicated data by checking
> srreplicateddata in pg_subscription_rel table.
> If the table has any replicated data, log a warning or error for this.
>

If you want to give the error in this case, then I think we need to
provide an option to the user to allow copy. One possibility could be
to extend existing copy_data option as 'false', 'true', 'force'. For
'false', there shouldn't be any change, for 'true', if 'only_local'
option is also set and the new column indicates replicated data then
give an error, for 'force', we won't give an error even if the
conditions as mentioned for 'true' case are met, rather we will allow
copy in this case.

> Also, we could document the steps on how to handle the initial sync like:
> a) Complete the ongoing transactions on this table in the replication
> setup nodes i.e. node1 and node2 in the above case, so that the table
> data is consistent, b) Once there are no ongoing transaction, Copy the
> table data using copy command from any one of the nodes, c) create
> subscription with copy_data option as off d) Perform further
> transactions on the table e) All the further transactions performed
> will be handled by the walsender which will take care of skipping
> replicated data and sending only the local data. i.e. node2 will send
> the locally generated data to node3.
>
> I'm not sure if there is any other better way to handle this.
>

I could think of the below options for users to set up bi-directional
replication for the same table.

Option-1:
There is no pre-existing data in the tables that are going to
participate in bi-directional replication. In such a case, Users can
create pub/sub (with only_local option as proposed by you) on both
nodes before starting any writes on tables. This will allow
bi-directional replication for the required tables. Now, if the user
wants one of the nodes to join at a later point, then the strategy in
Option-2/3 could be used.

Option-2:
One of the nodes (say node-1) has some pre-existing data and another
node (say node-2) doesn't have any pre-existing data. In this case,
the user can set up pub/sub (with only_local and copy_data as 'false'
options) for node-1 first before any of the operations on node-2.
Then, it can set up pub/sub on node-2. This will allow bi-directional
replication for the required tables.

Option-3:
Both the nodes have some pre-existing data. I think the easiest option
could be to truncate data on one of the nodes and set up pub/sub on
both nodes. See, one way to achieve it among two nodes as below:

Node-1:
Table t1 has data
1, 2, 3, 4

Publication for t1, pub1: Create Publication pub1 For Table t1;

Node-2:
Table t1 has data
5, 6, 7, 8
Publication for t1, pub1_2: Create Publication pub1_2 For Table t1;

Now, Create Subscription for pub1 on node1: Create Subscription sub1_2
Connection '<node-1 details>' Publication pub1 WITH (only_local =
true);

Node-1:
Begin;
# Disallow truncates to be published
Alter Publication pub1 Set (publish = 'insert, update, delete');
Truncate t1;
Create Subscription sub1 Connection '<node-2 details>' Publication
pub1 WITH (only_local = true, copy_data = 'force');
Alter Publication pub1 Set (publish = 'insert, update, delete, truncate');
Commit;

I think this will allow the bi-directional replication between two
nodes. In this scheme, the user needs to manually perform some steps
including truncate of the table on one of the nodes which she might or
might not like but at least there will be a way to set up a
bi-directional replication on two nodes for same table operations
which is not possible now.

I think one can even imagine using and extending this functionality so
that users don't need to perform TRUNCATE on one of the nodes. Say, in
the above case for tablesync phase, we make both nodes to start a
transaction, create a slot on another node (with USE_SNAPSHOT option),
and then allow copy from another node. I think it will be important to
allow copy on each node once the slots are created and the initial
snapshot is established.

For more than two nodes, I think we can suggest having either of the
option-1 or 2 for setup. But, there could be other ways as well
depending on how the user wants to do the setup.

Thoughts?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-03-02 10:31:24 Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Previous Message Aleksander Alekseev 2022-03-02 10:25:17 Re: pg_stop_backup() v2 incorrectly marked as proretset