RE: Data is copied twice when specifying both child and parent table in publication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2021-11-16 01:56:41
Message-ID: OS0PR01MB5716DD45E8E47CDCD1BAB63094999@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, November 12, 2021 12:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 11, 2021 at 12:22 PM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Friday, November 5, 2021 11:20 AM Greg Nancarrow
> <gregn4422(at)gmail(dot)com> wrote:
> > >On Thu, Nov 4, 2021 at 7:10 PM Amit Kapila
> <mailto:amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >Almost.
> > >The patch does seem to solve that first problem (double publish on
> tablesync).
> > >I used the following test (taken from [2]), and variations of it:
> > >
> > >However, there did still seem to be a problem, if
> > >publish_via_partition_root is then set to false; it seems that can result in
> duplicate partition entries in the pg_publication_tables view, see below (this
> follows on from the test scenario given above):
> > >
> > >postgres=# select * from pg_publication_tables; pubname | schemaname
> > >| tablename
> > >---------+------------+-----------
> > > pub1 | sch1 | tbl1
> > > pub1 | sch3 | t1
> > >(2 rows)
> > >
> > >postgres=# alter publication pub1 set
> > >(publish_via_partition_root=false);
> > >ALTER PUBLICATION
> > >postgres=# select * from pg_publication_tables; pubname | schemaname
> > >| tablename
> > >---------+------------+------------
> > > pub1 | sch2 | tbl1_part1
> > > pub1 | sch2 | tbl1_part2
> > > pub1 | sch2 | tbl1_part1
> > > pub1 | sch3 | t1
> > >(4 rows)
> > >
> > >So I think the patch would need to be updated to prevent that.
> >
> > Thanks for testing the patch.
> >
> > The reason of the duplicate output is that:
> > The existing function GetPublicationRelations doesn't de-duplicate the
> > output oid list. So, when adding both child and parent table to the
> > publication(pubviaroot = false), the pg_publication_tables view will
> > output duplicate partition.
> >
> > Attach the fix patch.
> > 0001 fix data double publish(first issue in this thread)
> > 0002 fix duplicate partition in view pg_publication_tables(reported by
> > greg when testing the 0001 patch)
> >
>
> Can we start a separate thread to discuss the 0002 patch as that doesn't seem
> directly to duplicate data issues being discussed here?
> Please specify the exact test in the email as that would make it easier to
> understand the problem.

Thanks for the suggestion.
I have started a new thread about this issue[1].

[1] https://www.postgresql.org/message-id/OS0PR01MB5716E97F00732B52DC2BBC2594989%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2021-11-16 02:00:08 Re: Slow standby snapshot
Previous Message houzj.fnst@fujitsu.com 2021-11-16 01:51:06 RE: pg_get_publication_tables() output duplicate relid