RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-17 06:28:00
Message-ID: OSZPR01MB62783EDCE9DDC21D35B789209EBD9@OSZPR01MB6278.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2023 at 20:25 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>

Thanks for your comments.

> + if (server_version >= 160000)
> + {
> + appendStringInfo(&cmd, "SELECT DISTINCT N.nspname, C.relname,\n"
> + " ( SELECT array_agg(a.attname ORDER BY a.attnum)\n"
> + " FROM pg_attribute a\n"
> + " WHERE a.attrelid = GPT.relid AND a.attnum > 0 AND\n"
> + " NOT a.attisdropped AND\n"
> + " (a.attnum = ANY(GPT.attrs) OR GPT.attrs IS NULL)\n"
> + " ) AS attnames\n"
> + " FROM pg_class C\n"
> + " JOIN pg_namespace N ON N.oid = C.relnamespace\n"
> + " JOIN ( SELECT (pg_get_publication_tables(VARIADIC
> array_agg(pubname::text))).*\n"
> + " FROM pg_publication\n"
> + " WHERE pubname IN ( %s )) as GPT\n"
> + " ON GPT.relid = C.oid\n",
> + pub_names.data);
>
> The function pg_get_publication_tables() has already handled dropped
> columns, so we don't need it here in this query. Also, the part to
> build attnames should be the same as it is in view
> pg_publication_tables.

Agree. Changed.

> Can we directly try to pass the list of
> pubnames to the function pg_get_publication_tables() instead of
> joining it with pg_publication?

Changed.
I think the aim of joining it with pg_publication before is to exclude
non-existing publications. Otherwise, we would get an error because of the call
to function GetPublicationByName (with 'missing_ok = false') in function
pg_get_publication_tables. So, I changed "missing_ok" to true. If anyone doesn't
like this change, I'll reconsider this in the next version.

> Can we keep the changes in the else part (fix when publisher < 16) the
> same as HEAD and move the proposed change to a separate patch?
> Basically, for the HEAD patch, let's just try to fix this when
> publisher >=16. I am slightly worried that as this is a corner case
> bug and we didn't see any user complaints for this, so introducing a
> complex fix for back branches may not be required or at least we can
> discuss that separately.

Split the patch as suggested.

Attach the new patch set.

Regards,
Wang Wei

Attachment Content-Type Size
HEAD-v17-0001-Fix-data-replicated-twice-when-specifying-publis.patch application/octet-stream 21.7 KB
HEAD-v17-0002-Fix-this-problem-for-back-branches.patch application/octet-stream 2.5 KB
HEAD-v17-0003-Add-clarification-for-the-behaviour-of-the-publi.patch application/octet-stream 2.4 KB
REL14_v17-0001-Fix-data-replicated-twice-when-specifying-publis_patch application/octet-stream 5.3 KB
REL15_v17-0001-Fix-data-replicated-twice-when-specifying-publis_patch application/octet-stream 8.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-03-17 06:58:29 Re: slapd logs to syslog during tests
Previous Message Kyotaro Horiguchi 2023-03-17 06:16:34 Re: In-placre persistance change of a relation