RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2022-05-09 01:51:09
Message-ID: OS3PR01MB62751C2FA25E23E24AB6980B9EC69@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 28, 2022 9:22 AM Shi, Yu/侍 雨 <shiy(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> Thanks for your patches.
>
> Here's a comment on the patch for REL14.
Thanks for your comments.

> + appendStringInfo(&cmd, "SELECT DISTINCT ns.nspname, c.relname\n"
> + " FROM
> pg_catalog.pg_publication_tables t\n"
> + " JOIN pg_catalog.pg_namespace
> ns\n"
> + " ON ns.nspname =
> t.schemaname\n"
> + " JOIN pg_catalog.pg_class c\n"
> + " ON c.relname = t.tablename AND
> c.relnamespace = ns.oid\n"
> + " WHERE t.pubname IN (%s)\n"
> + " AND (c.relispartition IS FALSE\n"
> + " OR NOT EXISTS\n"
> + " ( SELECT 1 FROM
> pg_partition_ancestors(c.oid) as relid\n"
> + " WHERE relid IN\n"
> + " (SELECT DISTINCT (schemaname
> || '.' || tablename)::regclass::oid\n"
> + " FROM
> pg_catalog.pg_publication_tables t\n"
> + " WHERE t.pubname IN (%s))\n"
> + " AND relid != c.oid))\n",
> + pub_names.data, pub_names.data);
>
> I think we can use an alias like 'pa' for pg_partition_ancestors, and modify the
> SQL as follows.
>
> + appendStringInfo(&cmd, "SELECT DISTINCT ns.nspname, c.relname\n"
> + " FROM
> pg_catalog.pg_publication_tables t\n"
> + " JOIN pg_catalog.pg_namespace
> ns\n"
> + " ON ns.nspname =
> t.schemaname\n"
> + " JOIN pg_catalog.pg_class c\n"
> + " ON c.relname = t.tablename AND
> c.relnamespace = ns.oid\n"
> + " WHERE t.pubname IN (%s)\n"
> + " AND (c.relispartition IS FALSE\n"
> + " OR NOT EXISTS\n"
> + " ( SELECT 1 FROM
> pg_partition_ancestors(c.oid) pa\n"
> + " WHERE pa.relid IN\n"
> + " (SELECT DISTINCT
> (t.schemaname || '.' || t.tablename)::regclass::oid\n"
> + " FROM
> pg_catalog.pg_publication_tables t\n"
> + " WHERE t.pubname IN (%s))\n"
> + " AND pa.relid != c.oid))\n",
> + pub_names.data, pub_names.data);
Fix it.

In addition, I try to modify the approach for the HEAD.
I enhance the API of function pg_get_publication_tables. Change the parameter
type from 'text' to 'any'. Then we can use this function to get tables from one
publication or an array of publications. Any thoughts on this approach?

Attach new patches.
The patch for HEAD:
1. Modify the approach. Enhance the API of function pg_get_publication_tables to
handle one publication or an array of publications.
The patch for REL14:
1. Improve the table sync SQL. [suggestions by Shi yu]

Regards,
Wang wei

Attachment Content-Type Size
HEAD_v2-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch application/octet-stream 14.6 KB
REL14_v4-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-05-09 03:18:39 Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Previous Message Tom Lane 2022-05-09 01:25:57 Re: failures in t/031_recovery_conflict.pl on CI