RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2022-05-13 02:02:25
Message-ID: OS3PR01MB6275E64C5C8BD561FD6E57359ECA9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thur, May 12, 2022 9:48 AM osumi(dot)takamichi(at)fujitsu(dot)com <osumi(dot)takamichi(at)fujitsu(dot)com> wrote:
> On Wednesday, May 11, 2022 11:33 AM I wrote:
> > On Monday, May 9, 2022 10:51 AM wangw(dot)fnst(at)fujitsu(dot)com
> > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > > Attach new patches.
> > > The patch for HEAD:
> > > 1. Modify the approach. Enhance the API of function
> > > pg_get_publication_tables to handle one publication or an array of
> > > publications.
> > > The patch for REL14:
> > > 1. Improve the table sync SQL. [suggestions by Shi yu]
> > Hi, thank you for updating the patch !
> >
> > Minor comments on your patch for HEAD v2.
Thanks for your comments.

> > (1) commit message sentence
> >
> > I suggest below sentence.
> >
> > Kindly change from
> > "... when subscribing to both publications using one subscription, the data is
> > replicated twice in inital copy"
> > to "subscribing to both publications from one subscription causes initial copy
> > twice".
Improve it according to your suggestion.

> > (2) unused variable
> >
> > pg_publication.c: In function ‘pg_get_publication_tables’:
> > pg_publication.c:1091:11: warning: unused variable ‘pubname’
> > [-Wunused-variable]
> > char *pubname;
> >
> > We can remove this.
Fix it.

> > (3) free of allocated memory
> >
> > In the pg_get_publication_tables(),
> > we don't free 'elems'. Don't we need it ?
Improve it according to your suggestion. Free 'elems'.

> > (4) some coding alignments
> >
> > 4-1.
> >
> > + List *tables_viaroot = NIL,
> > ...
> > + *current_table = NIL;
> >
> > I suggest we can put some variables
> > into the condition for the first time call of this function, like tables_viaroot and
> > current_table.
> > When you agree, kindly change it.
Improve these according to your suggestions.
Also, I put the code for getting publication(s) into the condition for the
first time call of this function.

> > 4-2.
> >
> > + /*
> > + * Publications support partitioned tables, although
> > all changes
> > + * are replicated using leaf partition identity and
> > schema, so we
> > + * only need those.
> > + */
> > + if (publication->alltables)
> > + {
> > + current_table =
> > GetAllTablesPublicationRelations(publication->pubviaroot);
> > + }
> >
> > This is not related to the change itself and now we are inheriting the previous
> > curly brackets, but I think there's no harm in removing it, since it's only for one
> > statement.
Improve these according to your suggestions.

> Hi,
>
> One more thing I'd like to add is that
> we don't hit the below code by tests.
> In the HEAD v2, we add a new filtering logic in pg_get_publication_tables.
> Although I'm not sure if this is related to the bug fix itself,
> when we want to include it in this patch, then
> I feel it's better to add some simple test for this part,
> to cover all the new main paths and check if
> new logic works correctly.
>
>
> + /*
> + * If a partition table is published in a publication with viaroot,
> + * and its parent or child table is published in another publication
> + * without viaroot. Then we need to move the parent or child table
> + * from tables to tables_viaroot.
> + *
> + * If all publication(s)'s viaroot are the same, then skip this part.
> + */
>
> ....
> if (ancestor_viaroot == ancestor)
> + {
> + tables = foreach_delete_current(tables, lc2);
> + change_tables =
> list_append_unique_oid(change_tables,
> + relid);
> + }
Yes, I agree.
But when I was adding the test, I found we could improve this part.
So, I removed this part of the code.

Also rebase it because the change in HEAD(23e7b38).

Attach the patches.(Only changed the patch for HEAD.).
1. Improve the commit message. [suggestions by Osumi-san]
2. Improve coding alignments and the usage for SRFs. [suggestions by Osumi-san and I]
3. Simplify the modifications in function pg_get_publication_tables.

Regards,
Wang wei

Attachment Content-Type Size
HEAD_v3-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch application/octet-stream 12.9 KB
REL14_v4-0001-Fix-data-replicated-twice-when-specifying-PUBLISH.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-05-13 02:02:31 Re: TRAP: FailedAssertion("tabstat->trans == trans", File: "pgstat_relation.c", Line: 508
Previous Message Peter Smith 2022-05-13 02:01:09 Re: recovery test failure on morepork with timestamp mystery