RE: Data is copied twice when specifying both child and parent table in publication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2022-08-30 02:24:10
Message-ID: OS0PR01MB5716A30DDEECC59132E1084F94799@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, August 10, 2022 7:45 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Here are some more review comments for the HEAD_v8 patch:
>
> ======
>
> 1. Commit message
>
> If there are two publications, one of them publish a parent table with
> (publish_via_partition_root = true) and another publish child table, subscribing
> to both publications from one subscription results in two initial replications. It
> should only be copied once.
>
> ~
>
> I took a 2nd look at that commit message and it seemed slightly backwards to
> me - e.g. don't you really mean for the 'publish_via_partition_root' parameter
> to be used when publishing the
> *child* table?

I'm not sure about this, I think we are trying to fix the bug when
'publish_via_partition_root' is used when publishing the parent table.

For this case(via_root used when publishing parent):

CREATE PUBLICATION pub1 for TABLE parent with(publish_via_partition_root);
CREATE PUBLICATION pub2 for TABLE child;
CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2;

The expected behavior is only the parent table is published, all the changes
should be replicated using the parent table's identity. So, we should only do
initial sync for the parent table once, but we currently will do table sync for
both parent and child which we think is a bug.

For another case you mentioned(via_root used when publishing child)

CREATE PUBLICATION pub1 for TABLE parent;
CREATE PUBLICATION pub2 for TABLE child with (publish_via_partition_root);
CREATE SUBSCRIPTION sub connect xxx PUBLICATION pub1,pub2;

The expected behavior is only the child table is published, all the changes
should be replicated using the child table's identity. We should do table sync
only for child tables and is same as the current behavior on HEAD. So, I think
there is no bug in this case.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2022-08-30 02:29:01 Re: Reducing the chunk header sizes on all memory context types
Previous Message Robert Haas 2022-08-30 02:16:26 Re: replacing role-level NOINHERIT with a grant-level option