RE: Data is copied twice when specifying both child and parent table in publication

From: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2022-11-17 07:43:38
Message-ID: OS3PR01MB6275470131128E4C371E0F7C9E069@OS3PR01MB6275.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thurs, Nov 17, 2022 at 13:58 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> On Wed, 16 Nov 2022 at 14:28, wangw(dot)fnst(at)fujitsu(dot)com
> <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Mon, Nov 14, 2022 at 0:56 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > > > Attach new patches.
> > >
> >
> > Thanks for your comments.
> >
> > > Here we are having tables list to store the relids and table_infos
> > > list which stores pubid along with relid. Here tables list acts as a
> > > temporary list to get filter_partitions and then delete the
> > > published_rel from table_infos. Will it be possible to directly
> > > operate on table_infos list and remove the temporary tables list used.
> > > We might have to implement comparator, deduplication functions and
> > > change filter_partitions function to work directly on published_rel
> > > type list.
> > > + /
> > > + * Record the published table and the
> > > corresponding publication so
> > > + * that we can get row filters and column list later.
> > > + *
> > > + * When a table is published by multiple
> > > publications, to obtain
> > > + * all row filters and column list, the
> > > structure related to this
> > > + * table will be recorded multiple times.
> > > + */
> > > + foreach(lc, pub_elem_tables)
> > > + {
> > > + published_rel *table_info =
> > > (published_rel *) malloc(sizeof(published_rel));
> > > +
> > > + table_info->relid = lfirst_oid(lc);
> > > + table_info->pubid = pub_elem->oid;
> > > + table_infos = lappend(table_infos, table_info);
> > > + }
> > > +
> > > + tables = list_concat(tables, pub_elem_tables);
> > >
> > > Thoughts?
> >
> > I think we could only deduplicate published tables per publication to get all
> > row filters and column lists for each published table later.
> > I removed the temporary list 'tables' and modified the API of the function
> > filter_partitions to handle published_rel type list.
> >
> > Attach the new patch set.
>
> Thanks for the update patch.

Thanks for your comment.

> One suggestion:
> +/* Records association between publication and published table */
> +typedef struct
> +{
> + Oid relid; /* OID of published table */
> + Oid pubid; /* OID of publication
> that publishes this
> + * table. */
> +} published_rel;
> +
>
> + /*
> + * Record the published table and the
> corresponding publication so
> + * that we can get row filters and column list later.
> + *
> + * When a table is published by multiple
> publications, to obtain
> + * all row filters and column list, the
> structure related to this
> + * table will be recorded multiple times.
> + */
> + foreach(lc, pub_elem_tables)
> + {
> + published_rel *table_info =
> (published_rel *) malloc(sizeof(published_rel));
> +
> + table_info->relid = lfirst_oid(lc);
> + table_info->pubid = pub_elem->oid;
> + table_infos = lappend(table_infos, table_info);
> + }
>
> In this format if there are n relations in publication we will store
> pubid n times, in all tables publication there will many thousands of
> tables. We could avoid storing the pubid for every relid, instead we
> could represent it like below to avoid storing publication id for
> each tables:
>
> +/* Records association between publication and published tables */
> +typedef struct
> +{
> + List *relids, /* OIDs of the publisher tables */
> + Oid pubid; /* OID of publication that publishes this
> + * tables. */
> +}published_rel;
>
> Thoughts?

I think this complicates the function filter_partitions.
Because if we use such a node type, I think we need to concatenate 'relids'
list of each node of the 'table_infos' list in the function filter_partitions
to become a temporary list. Then filter this temporary list and process the
'table_infos' list according to the filtering result.

Regards,
Wang wei

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-11-17 07:55:16 Re: Reducing power consumption on idle servers
Previous Message Bharath Rupireddy 2022-11-17 07:36:23 Re: Reducing power consumption on idle servers