Re: Handle infinite recursion in logical replication setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-07-25 08:54:00
Message-ID: CALDaNm1eJr6qXT9esVPzgc5Qvy4uMhV4kCCTSmxARKjf+Mwcnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>
> On 7/22/22 12:47 AM, Amit Kapila wrote:
> > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>
> >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> >> that we are overstating the current capabilities. I think this is
> >> accentuated int he opening paragraph:
> >>
> >> ==snip==
> >> Bidirectional replication is useful for creating a multi-master database
> >> environment for replicating read/write operations performed by any of the
> >> member nodes.
> >> ==snip==
> >>
> >> For one, we're not replicating reads, we're replicating writes. Amongst
> >> the writes, at this point we're only replicating DML. A reader could
> >> think that deploying can work for a full bidirectional solution.
> >>
> >> (Even if we're aspirationally calling this section "Bidirectional
> >> replication", that does make it sound like we're limited to two nodes,
> >> when we can support more than two).
> >>
> >
> > Right, I think the system can support N-Way replication.
>
> I did some more testing of the feature, i.e. doing 3-node and 4-node
> tests. While logical replication today can handle replicating between
> multiple nodes (N-way), the "origin = none" does require setting up
> subscribers between each of the nodes.
>
> For example, if I have 4 nodes A, B, C, D and I want to replicate the
> same table between all of them, I need to set up subscribers between all
> of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> can replicate between each other in a way that's convenient (vs. having
> to do something funky with partitions) so this is still a big step forward.
>
> This is a long way of saying that I do think it's fair to say we support
> "N-way" replication so long as you are set up in a mesh (vs. a ring,
> e.g. A=>B=>C=>D=>A).
>
> > Among the above "Replicating changes between primaries" sounds good to
> > me or simply "Replication between primaries". As this is a sub-section
> > on the Logical Replication page, I feel it is okay to not use Logical
> > in the title.
>
> Agreed, I think that's fine.
>
> >> At a minimum, I think we should reference the documentation we have in
> >> the logical replication section on conflicts. We may also want to advise
> >> that a user is responsible for designing their schemas in a way to
> >> minimize the risk of conflicts.
> >>
> >
> > This sounds reasonable to me.
> >
> > One more point about docs, it appears to be added as the last
> > sub-section on the Logical Replication page. Is there a reason for
> > doing so? I feel this should be third sub-section after describing
> > Publication and Subscription.
>
> When I first reviewed, I had not built the docs. Did so on this pass.
>
> I agree with the positioning argument, i.e. it should go after
> "Subscription" in the table of contents -- but it makes me question a
> couple of things:
>
> 1. The general ordering of the docs
> 2. How we describe that section (more on that in a sec)
> 3. If "row filters" should be part of "subscription" instead of its own
> section.
>
> If you look at the current order, "Quick setup" is the last section; one
> would think the "quick" portion goes first :) Given a lot of this is for
> the current docs, I may start a separate discussion on -docs for this part.
>
> For the time being, I agree it should be moved to the section after
> "Subscription".
>
> I think what this section describes is "Configuring Replication Between
> Nodes" as it covers a few different scenarios.
>
> I do think we need to iterate on these docs -- the examples with the
> commands are generally OK and easy to follow, but a few things I noticed:
>
> 1. The general description of the section needs work. We may want to
> refine the description of the use cases, and in the warning, link to
> instructions on how to take backups.
> 2. We put the "case not supported" in the middle, not at the end.
> 3. The "generic steps for adding a new node..." section uses a
> convention for steps that is not found in the docs. We also don't
> provide an example for this section, and this is the most complicated
> scenario to set up.
>
> I may be able to propose some suggestions in a few days.
>
> > BTW, do you have any opinion on the idea of the first remaining patch
> > where we accomplish two things: a) Checks and throws an error if
> > 'copy_data = on' and 'origin = none' but the publication tables were
> > also replicated from other publishers. b) Adds 'force' value for
> > copy_data parameter to allow copying in such a case. The primary
> > reason for this patch is to avoid loops or duplicate data in the
> > initial phase. We can't skip copying based on origin as we can do
> > while replicating changes from WAL. So, we detect that the publisher
> > already has data from some other node and doesn't allow replication
> > unless the user uses the 'force' option for copy_data.
>
> In general, I agree with the patch; but I'm not sure why we are calling
> "copy_data = force" in this case and how it varies from "on". I
> understand the goal is to prevent the infinite loop, but is there some
> technical restriction why we can't set "origin = none, copy_data = on"
> and have this work (and apologies if I missed that upthread)?

Let's take a simple case to understand why copy_data = force is
required to replicate between two primaries for table t1 which has
data as given below:
Node-1:
Table t1 (c1 int) has data
1, 2, 3, 4

Node-2:
Table t1 (c1 int) has data
5, 6, 7, 8

step1 - Node-1
#Publication for t1
Create Publication pub1_2 For Table t1;

step2 - Node-2
#Publication for t1,
Create Publication pub2_1 For Table t1;

step3 - Node-1:
Create Subscription sub1 Connection '<node-2 details>' publication
pub2_1 with (origin = none);

After this the data will be something like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
5, 6, 7, 8

step4 - Node-2:
Create Subscription sub2 Connection '<node-1 details>' Publication
pub1_2 with (origin = none, copy_data=on);
If we had allowed the create subscription to be successful with
copy_data = on. After this the data will be something like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8

So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
case, table t1 has a unique key, it will lead to a unique key
violation and replication won't proceed.

To avoid this we will throw an error:
ERROR: could not replicate table "public.t1"
DETAIL: CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
on is not allowed when the publisher has subscribed same table.
HINT: Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.

Users can then overcome this problem by using the following steps:
step1 to step3 is the same as above.

step4 - Node-2
# Disallow truncates to be published and then truncate the table
Alter Publication pub2_1 Set (publish = 'insert, update, delete');
Truncate t1;

After this the data will be like this:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2: no data

step5 - Node-2
Create Subscription sub2 Connection '<node-1 details>' Publication
pub1_2 with (origin = none, copy_data = force);

After this the data will be in sync:
Node-1:
1, 2, 3, 4, 5, 6, 7, 8

Node-2:
1, 2, 3, 4, 5, 6, 7, 8

step6 - Node-1
# Now include truncates to be published
Alter Publication pub1_2 Set (publish = 'insert, update, delete, truncate');

Now the replication setup between two primaries node1 and node2 is
complete. Any incremental changes from node1 will be replicated to
node2, and any incremental changes from node2 will be replicated to
node1.

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2022-07-25 08:54:36 Re: SLRUs in the main buffer pool, redux
Previous Message Bharath Rupireddy 2022-07-25 08:51:38 Re: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?