Re: Data is copied twice when specifying both child and parent table in publication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-20 10:14:49
Message-ID: CAA4eK1Jpj2Yj5ZWwt4z-qM2fS0c3ZF3_xqAgor-Ziiru00LKHg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 20, 2023 at 1:02 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
>
> ======
> src/include/catalog/pg_proc.dat
>
> 4.
> +{ oid => '6119',
> + descr => 'get information of the tables in the given publication array',
>
> Should that be worded in a way to make it more clear that the
> "publication array" is really an "array of publication names"?
>

I don't know how important it is to tell that the array is an array of
publication names but the current description can be improved. How
about something like: "get information of the tables that are part of
the specified publications"

Few other comments:
=================
1.
foreach(lc2, ancestors)
{
Oid ancestor = lfirst_oid(lc2);
+ ListCell *lc3;

/* Check if the parent table exists in the published table list. */
- if (list_member_oid(relids, ancestor))
+ foreach(lc3, table_infos)
{
- skip = true;
- break;
+ Oid relid = ((published_rel *) lfirst(lc3))->relid;
+
+ if (relid == ancestor)
+ {
+ skip = true;
+ break;
+ }
}
+
+ if (skip)
+ break;
}

- if (!skip)
- result = lappend_oid(result, relid);
+ if (skip)
+ table_infos = foreach_delete_current(table_infos, lc);

The usage of skip looks a bit ugly to me. Can we move the code for the
inner loop to a separate function (like
is_ancestor_member_tableinfos()) and remove the current cell if it
returns true?

2.
* Filter out the partitions whose parent tables were also specified in
* the publication.
*/
-static List *
-filter_partitions(List *relids)
+static void
+filter_partitions(List *table_infos)

The comment atop filter_partitions is no longer valid. Can we slightly
change it to: "Filter out the partitions whose parent tables are also
present in the list."?

3.
-# Note: We create two separate tables, not a partitioned one, so that we can
-# easily identity through which relation were the changes replicated.
+# Note: We only create one table (tab4) here. We specified
+# publish_via_partition_root = true (see pub_all and pub_lower_level above), so
+# all data will be replicated to that table.
$node_subscriber2->safe_psql('postgres',
"CREATE TABLE tab4 (a int PRIMARY KEY)");
-$node_subscriber2->safe_psql('postgres',
- "CREATE TABLE tab4_1 (a int PRIMARY KEY)");

I am not sure if it is a good idea to remove tab4_1 here. It is
testing something different as mentioned in the comments. Also, I
don't see any data in tab4 for the initial sync, so not sure if this
tests the behavior changed by this patch.

4.
--- a/src/test/subscription/t/031_column_list.pl
+++ b/src/test/subscription/t/031_column_list.pl
@@ -959,7 +959,8 @@ $node_publisher->safe_psql(
CREATE TABLE test_root_1 PARTITION OF test_root FOR VALUES FROM (1) TO (10);
CREATE TABLE test_root_2 PARTITION OF test_root FOR VALUES FROM (10) TO (20);

- CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
(publish_via_partition_root = true);
+ CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
(publish_via_partition_root = true);
+ CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
(publish_via_partition_root = true);

-- initial data
INSERT INTO test_root VALUES (1, 2, 3);
@@ -968,7 +969,7 @@ $node_publisher->safe_psql(

$node_subscriber->safe_psql(
'postgres', qq(
- CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
pub_root_true;
+ CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
pub_root_true_1, pub_root_true_2;

It is not clear to me what exactly you want to test here. Please add
some comments.

5. I think you can merge the 0001 and 0003 patches.

Apart from the above, attached is a patch to change some of the
comments in the patch.

--
With Regards,
Amit Kapila.

Attachment Content-Type Size
change_comments_1.patch.txt text/plain 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2023-03-20 10:24:20 Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)
Previous Message Tomas Vondra 2023-03-20 10:11:27 Re: Ignoring BRIN for HOT updates (was: -udpates seems broken)