RE: Handle infinite recursion in logical replication setup

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Subject: RE: Handle infinite recursion in logical replication setup
Date: 2022-08-22 03:49:15
Message-ID: OS0PR01MB5716C383623ADAD64CE4841194719@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, August 18, 2022 11:13 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Aug 17, 2022 at 12:34 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> >
> > On Wed, Aug 17, 2022 at 4:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > >
> > > On Wed, Aug 17, 2022 at 8:48 AM houzj(dot)fnst(at)fujitsu(dot)com
> > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > > >
> > > > On Tuesday, August 2, 2022 8:00 PM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > > On Tue, Jul 26, 2022 at 9:07 AM Amit Kapila
> <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > >
> > > > Thanks for the summary.
> > > >
> > > > I think it's fine to make the user use the copy_data option more
> > > > carefully to prevent duplicate copies by reporting an ERROR.
> > > >
> > > > But I also have similar concern with Sawada-san as it's possible
> > > > for user to receive an ERROR in some unexpected cases.
> > > >
> > > > For example I want to build bi-directional setup between two nodes:
> > > >
> > > > Node A: TABLE test (has actual data) Node B: TABLE test (empty)
> > > >
> > > > Step 1:
> > > > CREATE PUBLICATION on both Node A and B.
> > > >
> > > > Step 2:
> > > > CREATE SUBSCRIPTION on Node A with (copy_data = on)
> > > > -- this is fine as there is no data on Node B
> > > >
> > > > Step 3:
> > > > CREATE SUBSCRIPTION on Node B with (copy_data = on)
> > > > -- this should be fine as user needs to copy data from Node A to
> > > > Node B,
> > > > -- but we still report an error for this case.
> > > >
> > > > It looks a bit strict to report an ERROR in this case and it seems
> > > > not easy to avoid this. So, personally, I think it might be better
> > > > to document the correct steps to build the bi-directional
> > > > replication and probably also docuemnt the steps to recover if
> > > > user accidently did duplicate initial copy if not documented yet.
> > > >
> > > > In addition, we could also LOG some additional information about
> > > > the ORIGIN and initial copy which might help user to analyze if needed.
> > > >
> > >
> > > But why LOG instead of WARNING? I feel in this case there is a
> > > chance of inconsistent data so a WARNING like "publication "pub1"
> > > could have data from multiple origins" can be given when the user
> > > has specified
> > > options: "copy_data = on, origin = NONE" while creating a
> > > subscription. We give a WARNING during subscription creation when
> > > the corresponding publication doesn't exist, eg.
> > >
> > > postgres=# create subscription sub1 connection 'dbname = postgres'
> > > publication pub1;
> > > WARNING: publication "pub1" does not exist in the publisher
> > >
> > > Then, we can explain in docs how users can avoid data
> > > inconsistencies while setting up replication.
> > >
> >
> > I was wondering if this copy/origin case really should be a NOTICE.
> >
>
> We usually give NOTICE for some sort of additional implicit information, e.g.,
> when we create a slot during CREATE SUBSCRIPTION
> command: "NOTICE: created replication slot "sub1" on publisher". IMO, this is
> likely to be a problem of data inconsistency so I think here we can choose
> between WARNING and LOG. I prefer WARNING but okay with LOG as well if
> others feel so. I think we can change this later as well if required. We do have an
> option to not do anything and just document it but I feel it is better to give user
> some indication of problem here because not everyone reads each update of
> documentation.
>
> Jonathan, Sawada-San, Hou-San, and others, what do you think is the best way
> to move forward here?

I think it's fine to throw a WARNING in this case given that there is a
chance of inconsistent data.

Best regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-08-22 04:41:55 Re: Fix typo with logical connector (src/backend/commands/vacuumparallel.c)
Previous Message Amit Langote 2022-08-22 03:42:02 Re: cataloguing NOT NULL constraints