Re: Handle infinite recursion in logical replication setup

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-07-26 01:43:37
Message-ID: afb653ba-e2b1-33a3-a54c-849f4466e1b4@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/25/22 4:54 AM, vignesh C wrote:
> On Sun, Jul 24, 2022 at 10:21 PM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
>>
>> On 7/22/22 12:47 AM, Amit Kapila wrote:
>>> On Fri, Jul 22, 2022 at 1:39 AM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:

>>> BTW, do you have any opinion on the idea of the first remaining patch
>>> where we accomplish two things: a) Checks and throws an error if
>>> 'copy_data = on' and 'origin = none' but the publication tables were
>>> also replicated from other publishers. b) Adds 'force' value for
>>> copy_data parameter to allow copying in such a case. The primary
>>> reason for this patch is to avoid loops or duplicate data in the
>>> initial phase. We can't skip copying based on origin as we can do
>>> while replicating changes from WAL. So, we detect that the publisher
>>> already has data from some other node and doesn't allow replication
>>> unless the user uses the 'force' option for copy_data.
>>
>> In general, I agree with the patch; but I'm not sure why we are calling
>> "copy_data = force" in this case and how it varies from "on". I
>> understand the goal is to prevent the infinite loop, but is there some
>> technical restriction why we can't set "origin = none, copy_data = on"
>> and have this work (and apologies if I missed that upthread)?
>
> Let's take a simple case to understand why copy_data = force is
> required to replicate between two primaries for table t1 which has
> data as given below:

> step4 - Node-2:
> Create Subscription sub2 Connection '<node-1 details>' Publication
> pub1_2 with (origin = none, copy_data=on);
> If we had allowed the create subscription to be successful with
> copy_data = on. After this the data will be something like this:
> Node-1:
> 1, 2, 3, 4, 5, 6, 7, 8
>
> Node-2:
> 1, 2, 3, 4, 5, 6, 7, 8, 5, 6, 7, 8
>
> So, you can see that data on Node-2 (5, 6, 7, 8) is duplicated. In
> case, table t1 has a unique key, it will lead to a unique key
> violation and replication won't proceed.
>
> To avoid this we will throw an error:
> ERROR: could not replicate table "public.t1"
> DETAIL: CREATE/ALTER SUBSCRIPTION with origin = none and copy_data =
> on is not allowed when the publisher has subscribed same table.
> HINT: Use CREATE/ALTER SUBSCRIPTION with copy_data = off/force.

Thanks for the example. I agree that it is fairly simple to reproduce.

I understand that "copy_data = force" is meant to protect a user from
hurting themself. I'm not convinced that this is the best way to do so.

For example today I can subscribe to multiple publications that write to
the same table. If I have a primary key on that table, and two of the
subscriptions try to write an identical ID, we conflict. We don't have
any special flags or modes to guard against that from happening, though
we do have documentation on conflicts and managing them.

AFAICT the same issue with "copy_data" also exists in the above scenario
too, even without the "origin" attribute. However, I think this case is
more noticeable for "origin=none" because we currently default
"copy_data" to "true" and in this case data can be copied in two
directions.

That said, this introduces a new restriction for this particular
scenario that doesn't exist on other scenarios. Instead, I would
advocate we document how to correctly set up the two-way replication
scenario (which we have a draft!), document the warnings around the
conflicts, perhaps include Vignesh's instructions on how to remediate a
conflict on initial sync, and consider throwing a WARNING as you suggested.

Thoughts?

Thanks,

Jonathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2022-07-26 01:45:52 Re: Handle infinite recursion in logical replication setup
Previous Message Masahiko Sawada 2022-07-26 01:36:39 Introduce wait_for_subscription_sync for TAP tests