Re: Data is copied twice when specifying both child and parent table in publication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2022-10-21 09:01:54
Message-ID: CAHut+PtvCPuz21aZ2HxE5+4tbjuEmT2=_ZJmEXpuT3b274sKuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my review comments for HEAD patches v13*

//////

Patch HEAD_v13-0001

I already posted some follow-up questions. See [1]

/////

Patch HEAD_v13-0002

1. Commit message

The following usage scenarios are not described in detail in the manual:
If one subscription subscribes multiple publications, and these publications
publish a partitioned table and its partitions respectively. When we specify
this parameter on one or more of these publications, which identity and schema
should be used to publish the changes?

In these cases, I think the parameter publish_via_partition_root behave as
follows:

~

It seemed worded a bit strangely. Also, you said "on one or more of
these publications" but the examples only show only one publication
having 'publish_via_partition_root'.

SUGGESTION (I've modified the wording slightly but the examples are unchanged).

Assume a subscription is subscribing to multiple publications, and
these publications publish a partitioned table and its partitions
respectively:

[publisher-side]
create table parent (a int primary key) partition by range (a);
create table child partition of parent default;

create publication pub1 for table parent;
create publication pub2 for table child;

[subscriber-side]
create subscription sub connection 'xxxx' publication pub1, pub2;

The manual does not clearly describe the behaviour when the user had
specified the parameter 'publish_via_partition_root' on just one of
the publications. This patch modifies documentation to clarify the
following rules:

- If the parameter publish_via_partition_root is specified only in pub1,
changes will be published using the identity and schema of the table 'parent'.

- If the parameter publish_via_partition_root is specified only in pub2,
changes will be published using the identity and schema of the table 'child'.

~~~

2.

- If the parameter publish_via_partition_root is specified only in pub2,
changes will be published using the identity and schema of the table child.

~

Is that right though? This rule seems 100% contrary to the meaning of
'publish_via_partition_root=true'.

------

3. doc/src/sgml/ref/create_publication.sgml

+ <para>
+ If a root partitioned table is published by any subscribed
publications which
+ set publish_via_partition_root = true, changes on this root
partitioned table
+ (or on its partitions) will be published using the identity
and schema of this
+ root partitioned table rather than that of the individual partitions.
+ </para>

This seems to only describe the first example from the commit message.
What about some description to explain the second example?

------
[1] https://www.postgresql.org/message-id/CAHut%2BPt%2B1PNx6VsZ-xKzAU-18HmNXhjCC1TGakKX46Wg7YNT1Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-10-21 09:31:42 RE: Perform streaming logical transactions by background workers and parallel apply
Previous Message Peter Smith 2022-10-21 08:52:13 Re: Data is copied twice when specifying both child and parent table in publication