Re: Data is copied twice when specifying both child and parent table in publication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-24 01:48:47
Message-ID: CAHut+PtvAih2-8qKyTWbfEd4=4ia=TfsM4hzwTPQGajPKJ-1bA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Wang-san. I looked at the v21-0001 patch.

I don't have any new review comments -- only follow-ups for some of my
previous v20 comments that were rejected.

On Thu, Mar 23, 2023 at 8:11 PM wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thu, Mar 23, 2023 at 12:27 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> > Here are some review comments for patch v20-0001.
>
...
> > ======
> > src/test/subscription/t/013_partition.pl
> >
> > 5.
> > I think maybe you could have another test scenario where you INSERT
> > something into tab4_1_1 while only the publication for tab4_1 has
> > publish_via_partition_root=true
>
> I'm not sure if this scenario is necessary.
>

Please see my reply for #7 below.

> > ~~~
> >
> > 6.
> > AFAICT the tab4 tests are only testing the initial sync, but are not
> > testing normal replication. Maybe some more normal (post sync) INSERTS
> > are needed to tab4, tab4_1, tab4_1_1 ?
>
> Since I think the scenario we fixed is sync and not replication, it doesn't seem
> like we should extend the test you mentioned.
>

Maybe you are right. I only thought it would be better to have testing
which verifies that the sync phase and the normal replication phase
are using the same rules.

> > ======
> > src/test/subscription/t/028_row_filter.pl
> >
> > 7.
> > +# insert data into partitioned table.
> > +$node_publisher->safe_psql('postgres',
> > + "INSERT INTO tab_rowfilter_viaroot_part(a) VALUES(13), (17)");
> > +
> > $node_subscriber->safe_psql('postgres',
> > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
> > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b,
> > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1"
> > );
> > @@ -707,13 +711,17 @@ is($result, qq(t|1), 'check replicated rows to
> > tab_rowfilter_toast');
> > # the row filter for the top-level ancestor:
> > #
> > # tab_rowfilter_viaroot_part filter is: (a > 15)
> > +# - INSERT (13) NO, 13 < 15
> > # - INSERT (14) NO, 14 < 15
> > # - INSERT (15) NO, 15 = 15
> > # - INSERT (16) YES, 16 > 15
> > +# - INSERT (17) YES, 17 > 15
> > $result =
> > $node_subscriber->safe_psql('postgres',
> > - "SELECT a FROM tab_rowfilter_viaroot_part");
> > -is($result, qq(16), 'check replicated rows to tab_rowfilter_viaroot_part');
> > + "SELECT a FROM tab_rowfilter_viaroot_part ORDER BY 1");
> > +is( $result, qq(16
> > +17),
> > + 'check replicated rows to tab_rowfilter_viaroot_part');
> >
> > ~
> >
> > I'm not 100% sure this is testing quite what you want to test. AFAICT
> > the subscription is created like:
> >
> > "CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr
> > application_name=$appname' PUBLICATION tap_pub_1, tap_pub_2,
> > tap_pub_3, tap_pub_4a, tap_pub_4b, tap_pub_5a, tap_pub_5b,
> > tap_pub_toast, tap_pub_inherits, tap_pub_viaroot_2, tap_pub_viaroot_1"
>
> I think this is the scenario we fixed : Simultaneously subscribing to two
> publications that publish the parent and child respectively, then want to use
> the parent's identity and schema).
>

Yeah, but currently BOTH the tap_pub_viaroot_2, tap_pub_viaroot_1 are
using "WITH (publish_via_partition_root)", so IMO the user would
surely expect that only the root table would be published even when a
subscription combines those publications. OTOH, I thought a subtle
point of this patch is that now the same result will happen even if
only ONE of the publications was using "WITH
(publish_via_partition_root)". So it’s that scenario of “only ONE
publication is using the option” that I thought ought to be explicitly
tested.

This was the same also reason for my comment #5 above.

> > ======
> > src/test/subscription/t/031_column_list.pl
> >
> > 8.
> > - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
> > (publish_via_partition_root = true);
> > + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
> > (publish_via_partition_root = true);
> > + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> > (publish_via_partition_root = true);
> >
> > -- initial data
> > INSERT INTO test_root VALUES (1, 2, 3);
> > INSERT INTO test_root VALUES (10, 20, 30);
> > ));
> >
> > +# Subscribe to pub_root_true_1 and pub_root_true_2 at the same time, which
> > +# means that the initial data will be synced once, and only the column list of
> > +# the parent table (test_root) in the publication pub_root_true_1 will be used
> > +# for both table sync and data replication.
> > $node_subscriber->safe_psql(
> > 'postgres', qq(
> > - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> > pub_root_true;
> > + CREATE
> >
> > ~
> >
> > (This is similar to the previous review comment #7 above)
> >
> > Won't it be a better test of the "At least one" code when only the
> > publication of partition (test_root_1) is using "WITH
> > (publish_via_partition_root = true)".
> >
> > e.g
> > CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a);
> > CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> > (publish_via_partition_root = true);
>
> I think specifying one or both is the same scenario here.
> But it seemed clearer if only the "via_root" option is specified in the
> publication that publishes the parent, so I changed this point in
> "031_column_list.pl". Since the publications in "028_row_filter.pl" were
> introduced by other commits, I didn't change it.
>

In hindsight, I think those publications should be renamed to
something more appropriate. The name "pub_root_true_2" seems
misleading now since the publish_via_partition_root = false

e.g.1. pub_test_root, pub_test_root_1
or
e.g.2. pub_root_true, pub_root_1_false
etc.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2023-03-24 01:50:50 Re: Improve logging when using Huge Pages
Previous Message Michael Paquier 2023-03-24 01:24:43 Re: Commitfest 2023-03 starting tomorrow!