Re: Data is copied twice when specifying both child and parent table in publication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2022-07-28 09:17:03
Message-ID: CAHut+Pv90cYtdO3L+EL82RjyLNpJ6VY6DsTp_-nRWA4zhb9kag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for the HEAD_v7-0001 patch:

======

1. <General>

I have a fundamental question about this patch.

IIUC the purpose of this patch is to ensure that (when
publish_via_root = true) the copy of the partition data will happen
only once (e.g. from one parent table on one of the publishers). But I
think there is no guarantee that those 2 publishers even had the same
data, right? Therefore it seems to me you could get different results
if the data were copied from pub1 or from pub2. (I have not tried it -
this is just my suspicion).

Am I correct or mistaken? If correct, then it means there is a big
(but subtle) difference related to the ordering of processing ... A)
is this explicitly documented so the user knows what data to expect?
B) is the effect of different ordering tested anywhere? Actually, I
have no idea what exactly determines the order – is it the original
CREATE SUBSCRIPTION publication list order? Is it the logic of the
pg_get_publication_tables function? Is it the SQL in function
fetch_table_list? Or is it not deterministic at all? Please confirm
it.

======

2. Commit message.

2a.

If there are two publications that publish the parent table and the child table
separately, and both specify the option publish_via_partition_root, subscribing
to both publications from one subscription causes initial copy twice.

SUGGESTION
If there are two publications that publish the parent table and the child table
respectively, but both specify publish_via_partition_root = true, subscribing
to both publications from one subscription causes initial copy twice.

2b. <General>

Actually, isn't it more subtle than what that comment is describing?
Maybe nobody is explicitly publishing a parent table at all. Maybe
pub1 publishes partition1 and pub2 publishes partition2, but both
publications are using publish_via_partition_root = true. Is this
scenario even tested? Does the logic of pg_get_publication_tables
cover this scenario?

======

3. src/backend/catalog/pg_publication.c - pg_get_publication_tables

pg_get_publication_tables(PG_FUNCTION_ARGS)
{
#define NUM_PUBLICATION_TABLES_ELEM 3
- FuncCallContext *funcctx;
- char *pubname = text_to_cstring(PG_GETARG_TEXT_PP(0));
- Publication *publication;
- List *tables;
+ FuncCallContext *funcctx;
+ Publication *publication;

Something seems strange about having a common Publication declaration.
Firstly it is used to represent every publication element in the array
loop. Later, it is overwritten to represent a single publication.

I think it might be easier if you declare these variables separately:

E.g.1
Publication *pub_elem; -- for the array element processing declared
within the for loop

E.g.2
Publication *pub; -- declared within if (funcctx->call_cntr <
list_length(results))

~~~

4.

+ /* Filter by final published table. */
+ foreach(lc, results)
+ {
+ Oid *table_info = (Oid *) lfirst(lc);
+
+ if (!list_member_oid(tables, table_info[0]))
+ results = foreach_delete_current(results, lc);
}

The comment did not convey enough meaning. Can you make it more
descriptive to explain why/what the logic is doing here?

======

5. src/backend/commands/subscriptioncmds.c - fetch_table_list

/* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ if (server_version >= 160000)
+ appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"

That comment is exactly the same as it was before the patch. But it
doesn't seem quite appropriate anymore for this new condition and this
new query.

~~~

6.

+ /*
+ * Get the list of tables from publisher, the partition table whose
+ * ancestor is also in this list should be ignored, otherwise the
+ * initial data in the partition table would be replicated twice.
+ */

Why say "should be ignored" -- don’t you mean "will be" or "must be" or "is".

~~~

7.

initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");

/* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ if (server_version >= 160000)
+ appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"
+ " ( CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)\n"
+ " THEN NULL ELSE gpt.attrs END\n"
+ " ) AS attnames\n"
+ " FROM pg_class c\n"
+ " JOIN pg_namespace n ON n.oid = c.relnamespace\n"
+ " JOIN ( SELECT (pg_get_publication_tables(VARIADIC
array_agg(pubname::text))).*\n"
+ " FROM pg_publication\n"
+ " WHERE pubname IN ( %s )) as gpt\n"
+ " ON gpt.relid = c.oid\n",
+ pub_names.data);
+ else
+ {
+ /*
+ * Get the list of tables from publisher, the partition table whose
+ * ancestor is also in this list should be ignored, otherwise the
+ * initial data in the partition table would be replicated twice.
+ */

- appendStringInfoString(&cmd, "FROM pg_catalog.pg_publication_tables t\n"
- " WHERE t.pubname IN (");
- get_publications_str(publications, &cmd, true);
- appendStringInfoChar(&cmd, ')');
+ appendStringInfoString(&cmd, "WITH pub_tabs AS(\n"
+ " SELECT DISTINCT N.nspname, C.oid, C.relname, C.relispartition\n");
+
+ /* Get column lists for each relation if the publisher supports it */
+ if (check_columnlist)
+ appendStringInfoString(&cmd, ",( CASE WHEN (array_length(gpt.attrs,
1) = c.relnatts)\n"
+ " THEN NULL ELSE gpt.attrs END\n"
+ " ) AS attnames\n");
+
+ appendStringInfo(&cmd, " FROM pg_publication P,\n"
+ " LATERAL pg_get_publication_tables(P.pubname) GPT,\n"
+ " pg_class C JOIN pg_namespace N ON (N.oid = C.relnamespace)\n"
+ " WHERE C.oid = GPT.relid AND P.pubname IN ( %s )\n"
+ ")\n"
+ "SELECT DISTINCT pub_tabs.nspname, pub_tabs.relname\n",
+ pub_names.data);
+
+ /* Get column lists for each relation if the publisher supports it */
+ if (check_columnlist)
+ appendStringInfoString(&cmd, ", pub_tabs.attnames\n");
+
+ appendStringInfoString(&cmd, "FROM pub_tabs\n"
+ " WHERE (pub_tabs.relispartition IS FALSE\n"
+ " OR NOT EXISTS (SELECT 1 FROM
pg_partition_ancestors(pub_tabs.oid) as pa\n"
+ " WHERE pa.relid IN (SELECT pub_tabs.oid FROM pub_tabs)\n"
+ " AND pa.relid != pub_tabs.oid))\n");
+ }

Please use a consistent case for all the SQL aliases. E.g "gpt" versus
"GPT", "c" versus "C", etc.

======

8. src/test/subscription/t/013_partition.pl

+# Note: We only create one table for the partitioned table (tab4) here. Because
+# we specify option "publish_via_partition_root" (see pub_all and
+# pub_lower_level above), all data should be replicated to the partitioned
+# table. So we do not need to create table for the partition table.

"replicated to the partitioned table" ??

The entire comment seems a bit misleading because how can we call the
subscriber table a "partitioned" table when it has no partitions?!

SUGGESTION (maybe?)
Note: We only create one table (tab4) here. We specified
publish_via_partition_root = true (see pub_all and pub_lower_level
above), so all data will be replicated to that table.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-07-28 09:31:44 Re: Checking pgwin32_is_junction() errors
Previous Message Amit Kapila 2022-07-28 08:57:23 Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns