RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-21 07:40:00
Message-ID: OS3PR01MB6275D2940C734AF8C77435739E819@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2023 at 18:15 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>

Thanks for your comments.

> On Mon, Mar 20, 2023 at 1:02 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> wrote:
> >
> >
> > ======
> > src/include/catalog/pg_proc.dat
> >
> > 4.
> > +{ oid => '6119',
> > + descr => 'get information of the tables in the given publication array',
> >
> > Should that be worded in a way to make it more clear that the
> > "publication array" is really an "array of publication names"?
> >
>
> I don't know how important it is to tell that the array is an array of
> publication names but the current description can be improved. How
> about something like: "get information of the tables that are part of
> the specified publications"

Changed.

> Few other comments:
> =================
> 1.
> foreach(lc2, ancestors)
> {
> Oid ancestor = lfirst_oid(lc2);
> + ListCell *lc3;
>
> /* Check if the parent table exists in the published table list. */
> - if (list_member_oid(relids, ancestor))
> + foreach(lc3, table_infos)
> {
> - skip = true;
> - break;
> + Oid relid = ((published_rel *) lfirst(lc3))->relid;
> +
> + if (relid == ancestor)
> + {
> + skip = true;
> + break;
> + }
> }
> +
> + if (skip)
> + break;
> }
>
> - if (!skip)
> - result = lappend_oid(result, relid);
> + if (skip)
> + table_infos = foreach_delete_current(table_infos, lc);
>
> The usage of skip looks a bit ugly to me. Can we move the code for the
> inner loop to a separate function (like
> is_ancestor_member_tableinfos()) and remove the current cell if it
> returns true?

Changed.

> 2.
> * Filter out the partitions whose parent tables were also specified in
> * the publication.
> */
> -static List *
> -filter_partitions(List *relids)
> +static void
> +filter_partitions(List *table_infos)
>
> The comment atop filter_partitions is no longer valid. Can we slightly
> change it to: "Filter out the partitions whose parent tables are also
> present in the list."?

Changed.

> 3.
> -# Note: We create two separate tables, not a partitioned one, so that we can
> -# easily identity through which relation were the changes replicated.
> +# Note: We only create one table (tab4) here. We specified
> +# publish_via_partition_root = true (see pub_all and pub_lower_level above), so
> +# all data will be replicated to that table.
> $node_subscriber2->safe_psql('postgres',
> "CREATE TABLE tab4 (a int PRIMARY KEY)");
> -$node_subscriber2->safe_psql('postgres',
> - "CREATE TABLE tab4_1 (a int PRIMARY KEY)");
>
> I am not sure if it is a good idea to remove tab4_1 here. It is
> testing something different as mentioned in the comments. Also, I
> don't see any data in tab4 for the initial sync, so not sure if this
> tests the behavior changed by this patch.

Reverted this change. And inserted the initial sync data into table tab4 to test
this more clearly.

> 4.
> --- a/src/test/subscription/t/031_column_list.pl
> +++ b/src/test/subscription/t/031_column_list.pl
> @@ -959,7 +959,8 @@ $node_publisher->safe_psql(
> CREATE TABLE test_root_1 PARTITION OF test_root FOR VALUES FROM (1) TO
> (10);
> CREATE TABLE test_root_2 PARTITION OF test_root FOR VALUES FROM (10) TO
> (20);
>
> - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
> (publish_via_partition_root = true);
> + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
> (publish_via_partition_root = true);
> + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> (publish_via_partition_root = true);
>
> -- initial data
> INSERT INTO test_root VALUES (1, 2, 3);
> @@ -968,7 +969,7 @@ $node_publisher->safe_psql(
>
> $node_subscriber->safe_psql(
> 'postgres', qq(
> - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> pub_root_true;
> + CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> pub_root_true_1, pub_root_true_2;
>
> It is not clear to me what exactly you want to test here. Please add
> some comments.

Tried to add the following comment to make it clear:
```
+# Subscribe to pub_root_true_1 and pub_root_true_2 at the same time, which
+# means that the initial data will be synced once, and only the column list of
+# the parent table (test_root) in the publication pub_root_true_1 will be used
+# for both table sync and data replication.
$node_subscriber->safe_psql(
'postgres', qq(
- CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_root_true;
+ CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_root_true_1, pub_root_true_2;
```

> 5. I think you can merge the 0001 and 0003 patches.

Merged.

> Apart from the above, attached is a patch to change some of the
> comments in the patch.

Thanks for this improvement. I've checked and merged it.

Attach the new patch set.

Regards,
Wang Wei

Attachment Content-Type Size
HEAD_v19-0001-Fix-data-replicated-twice-when-specifying-publis.patch application/octet-stream 24.5 KB
HEAD_v19-0002-Fix-this-problem-for-back-branches.patch application/octet-stream 2.5 KB
REL15_v19-0001-Fix-data-replicated-twice-when-specifying-publis_patch application/octet-stream 10.4 KB
REL14_v19-0001-Fix-data-replicated-twice-when-specifying-publis_patch application/octet-stream 6.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message wangw.fnst@fujitsu.com 2023-03-21 07:40:16 RE: Data is copied twice when specifying both child and parent table in publication
Previous Message Eske Rahn 2023-03-21 07:38:12 Options to rowwise persist result of stable/immutable function with RECORD result