RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2022-08-05 09:06:49
Message-ID: OS3PR01MB627506056B78D533FBB331DC9E9E9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thur, Jul 28, 2022 at 17:17 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> Here are some review comments for the HEAD_v7-0001 patch:

Thanks for your comments.

> 2. Commit message.
>
> 2a.
>
> If there are two publications that publish the parent table and the child table
> separately, and both specify the option publish_via_partition_root, subscribing
> to both publications from one subscription causes initial copy twice.
>
> SUGGESTION
> If there are two publications that publish the parent table and the child table
> respectively, but both specify publish_via_partition_root = true, subscribing
> to both publications from one subscription causes initial copy twice.
>
> 2b. <General>
>
> Actually, isn't it more subtle than what that comment is describing?
> Maybe nobody is explicitly publishing a parent table at all. Maybe
> pub1 publishes partition1 and pub2 publishes partition2, but both
> publications are using publish_via_partition_root = true. Is this
> scenario even tested? Does the logic of pg_get_publication_tables
> cover this scenario?

=>2a.
Okay, changed it as suggested.

=>2b.
This is not the case we are trying to fix. The problematic scenario is when the
a parent table is published via root partitioned table and in this case we need
to ignore other partitions. And I try to improve the commit message to make it
clear.

> 4.
>
> + /* Filter by final published table. */
> + foreach(lc, results)
> + {
> + Oid *table_info = (Oid *) lfirst(lc);
> +
> + if (!list_member_oid(tables, table_info[0]))
> + results = foreach_delete_current(results, lc);
> }
>
> The comment did not convey enough meaning. Can you make it more
> descriptive to explain why/what the logic is doing here?

I think the comments above `tables = filter_partitions(tables);` explain this.

> 5. src/backend/commands/subscriptioncmds.c - fetch_table_list
>
> /* Get column lists for each relation if the publisher supports it */
> - if (check_columnlist)
> - appendStringInfoString(&cmd, ", t.attnames\n");
> + if (server_version >= 160000)
> + appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"
>
> That comment is exactly the same as it was before the patch. But it
> doesn't seem quite appropriate anymore for this new condition and this
> new query.

Improved the comments as following:
```
Get information of the tables belonging to the specified publications
```

The rest of the comments are improved as suggested.
I also rebased the patch based on the commit (0c20dd3) on HEAD, and made some
changes to the back-branch patches based on some of Peter's comments.

Attach the new patches.

Regards,
Wang wei

Attachment Content-Type Size
HEAD_v8-0001-Fix-data-replicated-twice-when-specifying-publish.patch application/octet-stream 18.0 KB
REL15_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch application/octet-stream 8.4 KB
REL14_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch application/octet-stream 5.2 KB
REL13_v8-0001-Fix-data-replicated-twice-when-specifying-publish_patch application/octet-stream 5.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2022-08-05 09:17:45 Re: Checking pgwin32_is_junction() errors
Previous Message Kyotaro Horiguchi 2022-08-05 08:40:01 Re: Patch to address creation of PgStat* contexts with null parent context