Re: Data is copied twice when specifying both child and parent table in publication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2022-11-17 05:57:58
Message-ID: CALDaNm3oUXw_YA4PeQrUAxFupkgye+e2LZbJxVOY9uCCvg2Jng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 16 Nov 2022 at 14:28, wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Mon, Nov 14, 2022 at 0:56 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > Attach new patches.
> >
>
> Thanks for your comments.
>
> > Here we are having tables list to store the relids and table_infos
> > list which stores pubid along with relid. Here tables list acts as a
> > temporary list to get filter_partitions and then delete the
> > published_rel from table_infos. Will it be possible to directly
> > operate on table_infos list and remove the temporary tables list used.
> > We might have to implement comparator, deduplication functions and
> > change filter_partitions function to work directly on published_rel
> > type list.
> > + /
> > + * Record the published table and the
> > corresponding publication so
> > + * that we can get row filters and column list later.
> > + *
> > + * When a table is published by multiple
> > publications, to obtain
> > + * all row filters and column list, the
> > structure related to this
> > + * table will be recorded multiple times.
> > + */
> > + foreach(lc, pub_elem_tables)
> > + {
> > + published_rel *table_info =
> > (published_rel *) malloc(sizeof(published_rel));
> > +
> > + table_info->relid = lfirst_oid(lc);
> > + table_info->pubid = pub_elem->oid;
> > + table_infos = lappend(table_infos, table_info);
> > + }
> > +
> > + tables = list_concat(tables, pub_elem_tables);
> >
> > Thoughts?
>
> I think we could only deduplicate published tables per publication to get all
> row filters and column lists for each published table later.
> I removed the temporary list 'tables' and modified the API of the function
> filter_partitions to handle published_rel type list.
>
> Attach the new patch set.

Thanks for the update patch.
One suggestion:
+/* Records association between publication and published table */
+typedef struct
+{
+ Oid relid; /* OID of published table */
+ Oid pubid; /* OID of publication
that publishes this
+ * table. */
+} published_rel;
+

+ /*
+ * Record the published table and the
corresponding publication so
+ * that we can get row filters and column list later.
+ *
+ * When a table is published by multiple
publications, to obtain
+ * all row filters and column list, the
structure related to this
+ * table will be recorded multiple times.
+ */
+ foreach(lc, pub_elem_tables)
+ {
+ published_rel *table_info =
(published_rel *) malloc(sizeof(published_rel));
+
+ table_info->relid = lfirst_oid(lc);
+ table_info->pubid = pub_elem->oid;
+ table_infos = lappend(table_infos, table_info);
+ }

In this format if there are n relations in publication we will store
pubid n times, in all tables publication there will many thousands of
tables. We could avoid storing the pubid for every relid, instead we
could represent it like below to avoid storing publication id for
each tables:

+/* Records association between publication and published tables */
+typedef struct
+{
+ List *relids, /* OIDs of the publisher tables */
+ Oid pubid; /* OID of publication that publishes this
+ * tables. */
+}published_rel;

Thoughts?

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2022-11-17 06:18:05 Re: Supporting TAP tests with MSVC and Windows
Previous Message Waithant Myo (Fujitsu) 2022-11-17 05:16:47 RE: Fix the README file for MERGE command