RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-24 09:06:45
Message-ID: OS3PR01MB6275CF983CD249CD7A9432109E849@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 24, 2023 at 14:17 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > ======
> > > > src/test/subscription/t/028_row_filter.pl
> > > >
> > > > 7.
> > > > +# insert data into partitioned table.
> > > > +$node_publisher->safe_psql('postgres',
> > > > + "INSERT INTO tab_rowfilter_viaroot_part(a) VALUES(13), (17)");
> > > > +
> > > > $node_subscriber->safe_psql('postgres',
> > > > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > > > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
> > > > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b,
> > > > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1"
> > > > );
> > > > @@ -707,13 +711,17 @@ is($result, qq(t|1), 'check replicated rows to
> > > > tab_rowfilter_toast');
> > > > # the row filter for the top-level ancestor:
> > > > #
> > > > # tab_rowfilter_viaroot_part filter is: (a > 15)
> > > > +# - INSERT (13) NO, 13 < 15
> > > > # - INSERT (14) NO, 14 < 15
> > > > # - INSERT (15) NO, 15 = 15
> > > > # - INSERT (16) YES, 16 > 15
> > > > +# - INSERT (17) YES, 17 > 15
> > > > $result =
> > > > $node_subscriber->safe_psql('postgres',
> > > > - "SELECT a FROM tab_rowfilter_viaroot_part");
> > > > -is($result, qq(16), 'check replicated rows to tab_rowfilter_viaroot_part');
> > > > + "SELECT a FROM tab_rowfilter_viaroot_part ORDER BY 1");
> > > > +is( $result, qq(16
> > > > +17),
> > > > + 'check replicated rows to tab_rowfilter_viaroot_part');
> > > >
> > > > ~
> > > >
> > > > I'm not 100% sure this is testing quite what you want to test. AFAICT
> > > > the subscription is created like:
> > > >
> > > > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > > > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
> > > > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b,
> > > > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1"
> > >
> > > I think this is the scenario we fixed : Simultaneously subscribing to two
> > > publications that publish the parent and child respectively, then want to use
> > > the parent's identity and schema).
> > >
> >
> > Yeah, but currently BOTH the tap_pub_viaroot_2, tap_pub_viaroot_1 are
> > using "WITH (publish_via_partition_root)", so IMO the user would
> > surely expect that only the root table would be published even when a
> > subscription combines those publications. OTOH, I thought a subtle
> > point of this patch is that now the same result will happen even if
> > only ONE of the publications was using "WITH
> > (publish_via_partition_root)". So it’s that scenario of “only ONE
> > publication is using the option” that I thought ought to be explicitly
> > tested.
> >
>
> The current change to existing tests is difficult to understand. I
> suggest writing a separate test for row filter and then cover the
> scenario you have suggested.

Changed as suggested.

And I found there is a problem in the three back-branch patches (HEAD_v21_0002*,
REL15_* and REL14_*):
In the function fetch_table_list, we use pg_partition_ancestors to get the list
of tables from the publisher. But the pg_partition_ancestors was introduced in
v12, which means that if the publisher is v11 and the subscriber is v14+, this
will cause an error.
Since we are going to first submit the fix for the publisher >= 16 case on HEAD,
I think we could discuss this issue later if needed. Also, I will update these
three patches later if needed.

Attach the new patch.

Regards,
Wang Wei

Attachment Content-Type Size
v22-0001-Fix-data-replicated-twice-when-specifying-publis.patch application/octet-stream 27.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2023-03-24 09:06:48 RE: Data is copied twice when specifying both child and parent table in publication
Previous Message Daniel Gustafsson 2023-03-24 08:56:29 Re: Raising the SCRAM iteration count