Re: Data is copied twice when specifying both child and parent table in publication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-11-05 03:20:15
Message-ID: CAJcOf-eURu03QNmD=37PtsxuNW4nBGN3G_FdRMBx_tpkeyzDUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 4, 2021 at 7:10 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Nov 4, 2021 at 12:23 PM Greg Nancarrow <gregn4422(at)gmail(dot)com>
wrote:
> >
> > On Thu, Nov 4, 2021 at 3:13 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > >
> > > On further thinking about this, I think we should define the behavior
> > > of replication among partitioned (on the publisher) and
> > > non-partitioned (on the subscriber) tables a bit more clearly.
> > >
> > > - If the "publish_via_partition_root" is set for a publication then we
> > > can always replicate to the table with the same name as the root table
> > > in publisher.
> > > - If the "publish_via_partition_root" is *not* set for a publication
> > > then we can always replicate to the tables with the same name as the
> > > non-root tables in publisher.
> > >
> > > Thoughts?
> > >
> >
> > I'd adjust that wording slightly, because "we can always replicate to
> > ..." sounds a bit vague, and saying that an option is set or not set
> > could be misinterpreted, as the option could be "set" to false.
> >
> > How about:
> >
> > - If "publish_via_partition_root" is true for a publication, then data
> > is replicated to the table with the same name as the root (i.e.
> > partitioned) table in the publisher.
> > - If "publish_via_partition_root" is false (the default) for a
> > publication, then data is replicated to tables with the same name as
> > the non-root (i.e. partition) tables in the publisher.
> >
>
> Sounds good to me. If we follow this then I think the patch by Hou-San
> is good to solve the first problem as described in his last email [1]?
>
> [1] -
https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com
>

Almost.
The patch does seem to solve that first problem (double publish on
tablesync).
I used the following test (taken from [2]), and variations of it:

--- Setup
create schema sch1;
create schema sch2;
create table sch1.tbl1 (a int) partition by range (a);
create table sch2.tbl1_part1 partition of sch1.tbl1 for values from (1) to
(10);
create table sch2.tbl1_part2 partition of sch1.tbl1 for values from
(10) to (20);
create schema sch3;
create table sch3.t1(c1 int);

--- Publication
create publication pub1 for all tables in schema sch3, table
sch1.tbl1, table sch2.tbl1_part1 with ( publish_via_partition_root=on);
insert into sch1.tbl1 values(1);
insert into sch1.tbl1 values(11);
insert into sch3.t1 values(1);

---- Subscription
CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost
port=5432' PUBLICATION pub1;

[2] -
https://postgr.es/m/CALDaNm3vxjPMMSrVDNK0f8UWP+EQ5ry14xfEukmXsVg_UcwZNA@mail.gmail.com

However, there did still seem to be a problem, if
publish_via_partition_root is then set to false; it seems that can result
in duplicate partition entries in the pg_publication_tables view, see below
(this follows on from the test scenario given above):

postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+-----------
pub1 | sch1 | tbl1
pub1 | sch3 | t1
(2 rows)

postgres=# alter publication pub1 set (publish_via_partition_root=false);
ALTER PUBLICATION
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+------------
pub1 | sch2 | tbl1_part1
pub1 | sch2 | tbl1_part2
pub1 | sch2 | tbl1_part1
pub1 | sch3 | t1
(4 rows)

So I think the patch would need to be updated to prevent that.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ken Kato 2021-11-05 03:31:42 Re: [PATCH] Added TRANSFORM FOR for COMMENT tab completion
Previous Message Kyotaro Horiguchi 2021-11-05 03:17:31 Re: Allow escape in application_name