Re: Handle infinite recursion in logical replication setup

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-07-25 14:44:28
Message-ID: CALDaNm3SVgcF2v8_JCjm+8OrPSuaGm+VYc8Fm446zMubHE5VSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 25, 2022 at 2:24 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
> >
> > On 7/22/22 12:47 AM, Amit Kapila wrote:
> > > On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
> >
> > >> 1. I'm concerned by calling this "Bidirectional replication" in the docs
> > >> that we are overstating the current capabilities. I think this is
> > >> accentuated int he opening paragraph:
> > >>
> > >> ==snip==
> > >> Bidirectional replication is useful for creating a multi-master database
> > >> environment for replicating read/write operations performed by any of the
> > >> member nodes.
> > >> ==snip==
> > >>
> > >> For one, we're not replicating reads, we're replicating writes. Amongst
> > >> the writes, at this point we're only replicating DML. A reader could
> > >> think that deploying can work for a full bidirectional solution.
> > >>
> > >> (Even if we're aspirationally calling this section "Bidirectional
> > >> replication", that does make it sound like we're limited to two nodes,
> > >> when we can support more than two).
> > >>
> > >
> > > Right, I think the system can support N-Way replication.
> >
> > I did some more testing of the feature, i.e. doing 3-node and 4-node
> > tests. While logical replication today can handle replicating between
> > multiple nodes (N-way), the "origin = none" does require setting up
> > subscribers between each of the nodes.
> >
> > For example, if I have 4 nodes A, B, C, D and I want to replicate the
> > same table between all of them, I need to set up subscribers between all
> > of them (A<=>B, A<=>C, A<=>D, B<=>C, B<=>D, C<=>D). However, each node
> > can replicate between each other in a way that's convenient (vs. having
> > to do something funky with partitions) so this is still a big step forward.
> >
> > This is a long way of saying that I do think it's fair to say we support
> > "N-way" replication so long as you are set up in a mesh (vs. a ring,
> > e.g. A=>B=>C=>D=>A).
> >
> > > Among the above "Replicating changes between primaries" sounds good to
> > > me or simply "Replication between primaries". As this is a sub-section
> > > on the Logical Replication page, I feel it is okay to not use Logical
> > > in the title.
> >
> > Agreed, I think that's fine.
> >
> > >> At a minimum, I think we should reference the documentation we have in
> > >> the logical replication section on conflicts. We may also want to advise
> > >> that a user is responsible for designing their schemas in a way to
> > >> minimize the risk of conflicts.
> > >>
> > >
> > > This sounds reasonable to me.
> > >
> > > One more point about docs, it appears to be added as the last
> > > sub-section on the Logical Replication page. Is there a reason for
> > > doing so? I feel this should be third sub-section after describing
> > > Publication and Subscription.
> >
> > When I first reviewed, I had not built the docs. Did so on this pass.
> >
> > I agree with the positioning argument, i.e. it should go after
> > "Subscription" in the table of contents -- but it makes me question a
> > couple of things:
> >
> > 1. The general ordering of the docs
> > 2. How we describe that section (more on that in a sec)
> > 3. If "row filters" should be part of "subscription" instead of its own
> > section.
> >
> > If you look at the current order, "Quick setup" is the last section; one
> > would think the "quick" portion goes first :) Given a lot of this is for
> > the current docs, I may start a separate discussion on -docs for this part.
> >
> > For the time being, I agree it should be moved to the section after
> > "Subscription".
> >
> > I think what this section describes is "Configuring Replication Between
> > Nodes" as it covers a few different scenarios.
> >
> > I do think we need to iterate on these docs -- the examples with the
> > commands are generally OK and easy to follow, but a few things I noticed:
> >
> > 1. The general description of the section needs work. We may want to
> > refine the description of the use cases, and in the warning, link to
> > instructions on how to take backups.
> > 2. We put the "case not supported" in the middle, not at the end.
> > 3. The "generic steps for adding a new node..." section uses a
> > convention for steps that is not found in the docs. We also don't
> > provide an example for this section, and this is the most complicated
> > scenario to set up.
> >
> > I may be able to propose some suggestions in a few days.
> >
> > > BTW, do you have any opinion on the idea of the first remaining patch
> > > where we accomplish two things: a) Checks and throws an error if
> > > 'copy_data = on' and 'origin = none' but the publication tables were
> > > also replicated from other publishers. b) Adds 'force' value for
> > > copy_data parameter to allow copying in such a case. The primary
> > > reason for this patch is to avoid loops or duplicate data in the
> > > initial phase. We can't skip copying based on origin as we can do
> > > while replicating changes from WAL. So, we detect that the publisher
> > > already has data from some other node and doesn't allow replication
> > > unless the user uses the 'force' option for copy_data.
> >
> > In general, I agree with the patch; but I'm not sure why we are calling
> > "copy_data = force" in this case and how it varies from "on". I
> > understand the goal is to prevent the infinite loop, but is there some
> > technical restriction why we can't set "origin = none, copy_data = on"
> > and have this work (and apologies if I missed that upthread)?
>
> Let's take a simple case to understand why copy_data = force is
> required to replicate between two primaries for table t1 which has
> data as given below:
> Node-1:
> Table t1 (c1 int) has data
> 1, 2, 3, 4
>
> Node-2:
> Table t1 (c1 int) has data
> 5, 6, 7, 8
>
> step1 - Node-1
> #Publication for t1
> Create Publication pub1_2 For Table t1;
>
> step2 - Node-2
> #Publication for t1,
> Create Publication pub2_1 For Table t1;
>
> step3 - Node-1:
> Create Subscription sub1 Connection '<node-2 details>' publication
> pub2_1 with (origin = none);
>
> After this the data will be something like this:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> Node-2:
> 5, 6, 7, 8
>
> step4 - Node-2:
> Create Subscription sub2 Connection '<node-1 details>' Publication
> pub1_2 with (origin = none, copy_data=on);
> If we had allowed the create subscription to be successful with
> copy_data = on. After this the data will be something like this:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> Node-2:
> 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
>
> So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
> case, table t1 has a unique key, it will lead to a unique key
> violation and replication won't proceed.
>
> To avoid this we will throw an error:
> ERROR: could not replicate table "public.t1"
> DETAIL: CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
> on is not allowed when the publisher has subscribed same table.
> HINT: Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.
>
> Users can then overcome this problem by using the following steps:
> step1 to step3 is the same as above.
>
> step4 - Node-2
> # Disallow truncates to be published and then truncate the table
> Alter Publication pub2_1 Set (publish = 'insert, update, delete');
> Truncate t1;
>
> After this the data will be like this:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> Node-2: no data
>
> step5 - Node-2
> Create Subscription sub2 Connection '<node-1 details>' Publication
> pub1_2 with (origin = none, copy_data = force);
>
> After this the data will be in sync:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> Node-2:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> step6 - Node-1
> # Now include truncates to be published
> Alter Publication pub1_2 Set (publish = 'insert, update, delete, truncate');
>
> Now the replication setup between two primaries node1 and node2 is
> complete. Any incremental changes from node1 will be replicated to
> node2, and any incremental changes from node2 will be replicated to
> node1.

In the above steps, sorry that I mentioned Node-1 instead of Node-2 in
the last step step6.
The below:
step6 - Node-1
# Now include truncates to be published
Alter Publication pub1_2 Set (publish = 'insert, update, delete, truncate');

should be:
step6 - Node-2
# Now include truncates to be published
Alter Publication pub2_1 Set (publish = 'insert, update, delete, truncate');

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-07-25 14:52:15 Re: fairywren hung in pg_basebackup tests
Previous Message Martin Kalcher 2022-07-25 14:40:51 [Patch] Fix bounds check in trim_array()