RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2022-04-07 03:08:14
Message-ID: OS3PR01MB6275CD6236DEB022C0785E9B9EE69@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thur, Mar 10, 2021 at 10:08 AM houzj(dot)fnst(at)fujitsu(dot)com wrote:
> Hi,
>
> When reviewing some logical replication related features. I noticed another
> possible problem if the subscriber subscribes multiple publications which
> publish parent and child table.
>
> For example:
>
> ----pub
> create table t (a int, b int, c int) partition by range (a);
> create table t_1 partition of t for values from (1) to (10);
>
> create publication pub1 for table t
> with (PUBLISH_VIA_PARTITION_ROOT);
> create publication pub2 for table t_1
> with (PUBLISH_VIA_PARTITION_ROOT);
>
> ----sub
> ---- prepare table t and t_1
> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres'
> PUBLICATION pub1, pub2;
>
> select * from pg_subscription_rel ;
> srsubid | srrelid | srsubstate | srsublsn
> ---------+---------+------------+-----------
> 16391 | 16385(t) | r | 0/150D100
> 16391 | 16388(t_1) | r | 0/150D138
>
> If subscribe two publications one of them publish parent table with
> (pubviaroot=true) and another publish child table. Both the parent table and
> child table will exist in pg_subscription_rel which also means we will do
> initial copy for both tables.
>
> But after initial copy, we only publish change with the schema of the parent
> table(t). It looks a bit inconsistent.
>
> Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think the
> expected behavior could be we only store the top most parent(table t) in
> pg_subscription_rel and do initial copy for it if pubviaroot is on. I haven't
> thought about how to fix this and will investigate this later.
Hi,
I try to fix this bug. Attach the patch.

The current HEAD get table list for one publication by invoking function
pg_get_publication_tables. If multiple publications are subscribed, then this
function is invoked multiple times. So option PUBLISH_VIA_PARTITION_ROOT works
independently on every publication, I think it does not work correctly on
different publications of the same subscription.

So I fix this bug by the following two steps:
First step,
I get oids of subscribed tables by publication list. Then for tables with the
same topmost root table, I filter them base on the option
PUBLISH_VIA_PARTITION_ROOT(see new function filter_partitions_oids).
After filtering, I get the final oid list.
Second step,
I get the required informations(nspname and relname) base on the oid list of
first step.

Regards,
Wang wei

Attachment Content-Type Size
v1-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch application/octet-stream 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-04-07 03:25:06 Re: How about a psql backslash command to show GUCs?
Previous Message Tom Lane 2022-04-07 03:02:54 Re: How about a psql backslash command to show GUCs?