Re: Handle infinite recursion in logical replication setup

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-08-04 13:00:53
Message-ID: CAD21AoBBu8B_mZsE6apRacbPO8Am9Jg2GQcCTDNziJ0bF8Vd+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 2, 2022 at 8:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, Jul 26, 2022 at 7:13 AM Jonathan S. Katz <jkatz(at)postgresql(dot)org> wrote:
> > >
> > > Thanks for the example. I agree that it is fairly simple to reproduce.
> > >
> > > I understand that "copy_data = force" is meant to protect a user from
> > > hurting themself. I'm not convinced that this is the best way to do so.
> > >
> > > For example today I can subscribe to multiple publications that write to
> > > the same table. If I have a primary key on that table, and two of the
> > > subscriptions try to write an identical ID, we conflict. We don't have
> > > any special flags or modes to guard against that from happening, though
> > > we do have documentation on conflicts and managing them.
> > >
> > > AFAICT the same issue with "copy_data" also exists in the above scenario
> > > too, even without the "origin" attribute.
> > >
> >
> > That's true but there is no parameter like origin = NONE which
> > indicates that constraint violations or duplicate data problems won't
> > occur due to replication. In the current case, I think the situation
> > is different because a user has specifically asked not to replicate
> > any remote data by specifying origin = NONE, which should be dealt
> > differently. Note that current users or their setup won't see any
> > difference/change unless they specify the new parameter origin as
> > NONE.
> >
>
> Let me try to summarize the discussion so that it is easier for others
> to follow. The work in this thread is to avoid loops, and duplicate
> data in logical replication when the operations happened on the same
> table in multiple nodes. It has been explained in email [1] with an
> example of how a logical replication setup can lead to duplicate or
> inconsistent data.
>
> The idea to solve this problem is that we don't replicate data that is
> not generated locally which we can normally identify based on origin
> of data in WAL. The commit 366283961a achieves that for replication
> but still the problem can happen during initial sync which is
> performed internally via copy. We can't differentiate the data in heap
> based on origin. So, we decided to prohibit the subscription
> operations that can perform initial sync (ex. Create Subscription,
> Alter Subscription ... Refresh) by detecting that the publisher has
> subscribed to the same table from some other publisher.
>
> To prohibit the subscription operations, the currently proposed patch
> throws an error. Then, it also provides a new copy_data option
> 'force' under which the user will still be able to perform the
> operation. This could be useful when the user intentionally wants to
> replicate the initial data even if it contains data from multiple
> nodes (for example, when in a multi-node setup, one decides to get the
> initial data from just one node and then allow replication of data to
> proceed from each of respective nodes).
>
> The other alternative discussed was to just give a warning for
> subscription operations and probably document the steps for users to
> avoid it. But the problem with that is once the user sees this
> warning, it won't be able to do anything except recreate the setup, so
> why not give an error in the first place?
>
> Thoughts?

Thank you for the summary!

I understand that this feature could help some cases, but I'm really
not sure adding new value 'force' for them is worthwhile, and
concerned it could reduce the usability.

IIUC this feature would work only when origin = 'none' and copy_data =
'on', and the purpose is to prevent the data from being
duplicated/conflicted by the initial table sync. But there are cases
where duplication/conflict doesn't happen even if origin = 'none' and
copy_data = 'on'. For instance, the table on the publisher might be
empty. Also, even with origin = 'any', copy_data = 'on', there is a
possibility of data duplication/conflict. Why do we need to address
only the case where origin = 'none'? I think that using origin =
'none' doesn't necessarily mean using bi-directional (or N-way)
replication. Even when using uni-directional logical replication with
two nodes, they may use origin = 'none'. Therefore, it seems to me
that this feature works only for a narrow situation and has false
positives.

Since it has been the user's responsibility not to try to make the
data inconsistent by the initial table sync, I think that it might be
sufficient if we note the risk in the documentation.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-08-04 13:17:23 Re: Smoothing the subtrans performance catastrophe
Previous Message Alvaro Herrera 2022-08-04 12:56:45 Re: enable/disable broken for statement triggers on partitioned tables