Re: Data is copied twice when specifying both child and parent table in publication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2022-11-13 16:55:48
Message-ID: CALDaNm3ZzNJ6Y1iX=PBYCcCU+EPm4=9LVXFin5gvQ9XkEzc8UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 11 Nov 2022 at 11:13, wangw(dot)fnst(at)fujitsu(dot)com
<wangw(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Fri, Oct 21, 2022 at 17:02 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
>
> Thanks for your comments. Sorry for not replying in time.
>
> > On Mon, Oct 17, 2022 at 4:49 PM wangw(dot)fnst(at)fujitsu(dot)com
> > <wangw(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > On Wed, Oct 5, 2022 at 11:08 AM Peter Smith <smithpb2250(at)gmail(dot)com>
> > wrote:
> > > > Hi Wang-san. Here are my review comments for HEAD_v12-0001 patch.
> > >
> > ...
> > > >
> > > > 3. QUESTION - pg_get_publication_tables / fetch_table_list
> > > >
> > > > When the same table is published by different publications (but there
> > > > are other differences like row-filters/column-lists in each
> > > > publication) the result tuple of this function does not include the
> > > > pubid. Maybe the SQL of pg_publication_tables/fetch_table_list() is OK
> > > > as-is but how does it manage to associate each table with the correct
> > > > tuple?
> > > >
> > > > I know it apparently all seems to work but I’m not how does that
> > > > happen? Can you explain why a puboid is not needed for the result
> > > > tuple of this function?
> > >
> > > Sorry, I am not sure I understand your question.
> > > I try to answer your question by explaining the two functions you mentioned:
> > >
> > > First, the function pg_get_publication_tables gets the list (see table_infos)
> > > that included published table and the corresponding publication. Then based
> > > on this list, the function pg_get_publication_tables returns information
> > > (scheme, relname, row filter and column list) about the published tables in the
> > > publications list. It just doesn't return pubid.
> > >
> > > Then, the SQL in the function fetch_table_list will get the columns in the
> > > column list from pg_attribute. (This is to return all columns when the column
> > > list is not specified)
> > >
> >
> > I meant, for example, if the different publications specified
> > different col-lists for the same table then IIUC the
> > fetch_table_lists() is going to return 2 list elements
> > (schema,rel_name,row_filter,col_list). But when the schema/rel_name
> > are the same for 2 elements then (without also a pubid) how are you
> > going to know where the list element came from, and how come that is
> > not important?
> >
> > > > ~~
> > > >
> > > > test_pub=# create table t1(a int, b int, c int);
> > > > CREATE TABLE
> > > > test_pub=# create publication pub1 for table t1(a) where (a > 99);
> > > > CREATE PUBLICATION
> > > > test_pub=# create publication pub2 for table t1(a,b) where (b < 33);
> > > > CREATE PUBLICATION
> > > >
> > > > Following seems OK when I swap orders of publication names...
> > > >
> > > > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > > > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > > > ARRAY['pub2','pub1']) gpt(relid, attrs, qual);
> > > > relid | attrs | rowfilter
> > > > -------+-------+-----------
> > > > 16385 | 1 2 | (b < 33)
> > > > 16385 | 1 | (a > 99)
> > > > (2 rows)
> > > >
> > > > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > > > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > > > ARRAY['pub1','pub2']) gpt(relid, attrs, qual);
> > > > relid | attrs | rowfilter
> > > > -------+-------+-----------
> > > > 16385 | 1 | (a > 99)
> > > > 16385 | 1 2 | (b < 33)
> > > > (2 rows)
> > > >
> > > > But what about this (this is similar to the SQL fragment from
> > > > fetch_table_list); I swapped the pub names but the results are the
> > > > same...
> > > >
> > > > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > > > array_agg(p.pubname)) from pg_publication p where pubname
> > > > IN('pub2','pub1');
> > > >
> > > > pg_get_publication_tables
> > > >
> > > > ---------------------------------------------------------------------------------------------
> > ----
> > > > ---------------------------------------------------------------------
> > > > ---------------------------------------------------------------------------------------------
> > ----
> > > > ---------------------------------------------------------------------
> > > > -------------------------------------------------------------------
> > > > (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > > > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > > > :vartype 23 :vartypmod -1 :var
> > > > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > > > :constbyval true :constisnull false :
> > > > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> > > > (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > > > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > > > :varattno 2 :vartype 23 :vartypmod -1 :v
> > > > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > > > :constbyval true :constisnull false
> > > > :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}")
> > > > (2 rows)
> > > >
> > > > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > > > array_agg(p.pubname)) from pg_publication p where pubname
> > > > IN('pub1','pub2');
> > > >
> > > > pg_get_publication_tables
> > > >
> > > > ---------------------------------------------------------------------------------------------
> > ----
> > > > ---------------------------------------------------------------------
> > > > ---------------------------------------------------------------------------------------------
> > ----
> > > > ---------------------------------------------------------------------
> > > > -------------------------------------------------------------------
> > > > (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > > > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > > > :vartype 23 :vartypmod -1 :var
> > > > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > > > :constbyval true :constisnull false :
> > > > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> > > > (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > > > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > > > :varattno 2 :vartype 23 :vartypmod -1 :v
> > > > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > > > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > > > :constbyval true :constisnull false
> > > > :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}")
> > > > (2 rows)
> > >
> > > I think this is because the usage of SELECT statement. The order seems
> > depend
> > > on pg_publication. Such as:
> > >
> > > postgres=# SELECT array_agg(p.pubname) FROM pg_publication p WHERE
> > pubname IN ('pub1','pub2');
> > > array_agg
> > > -------------
> > > {pub1,pub2}
> > > (1 row)
> > >
> > > postgres=# SELECT array_agg(p.pubname) FROM pg_publication p WHERE
> > pubname IN ('pub2','pub1');
> > > array_agg
> > > -------------
> > > {pub1,pub2}
> > > (1 row)
> > >
> >
> > Right, so I felt it was a bit dubious that the result of the function
> > “seems to depend on” something. That’s why I was asking why the list
> > elements did not include a pubid. Then a caller could be certain what
> > element belonged with what publication. It's not quite clear to me why
> > that is not important for this patch - but anyway, even if it's not
> > necessary for this patch's usage, this is a function that is exposed
> > to users who might have different needs/expectations than this patch
> > has, so shouldn't the result be less fuzzy for them?
>
> Yes, I agree that there may be such a need in the future. Added 'pubid' to the
> output of this function.
> BTW, I think the usage of the function pg_get_publication_tables is not
> documented in the pg-doc now, it doesn't seem to be a function provided to
> users. So I didn't modify the documentation.
>
> Attach new patches.

Here we are having tables list to store the relids and table_infos
list which stores pubid along with relid. Here tables list acts as a
temporary list to get filter_partitions and then delete the
published_rel from table_infos. Will it be possible to directly
operate on table_infos list and remove the temporary tables list used.
We might have to implement comparator, deduplication functions and
change filter_partitions function to work directly on published_rel
type list.
+ /
+ * Record the published table and the
corresponding publication so
+ * that we can get row filters and column list later.
+ *
+ * When a table is published by multiple
publications, to obtain
+ * all row filters and column list, the
structure related to this
+ * table will be recorded multiple times.
+ */
+ foreach(lc, pub_elem_tables)
+ {
+ published_rel *table_info =
(published_rel *) malloc(sizeof(published_rel));
+
+ table_info->relid = lfirst_oid(lc);
+ table_info->pubid = pub_elem->oid;
+ table_infos = lappend(table_infos, table_info);
+ }
+
+ tables = list_concat(tables, pub_elem_tables);

Thoughts?

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-11-13 17:59:30 Re: Schema variables - new implementation for Postgres 15
Previous Message Simon Riggs 2022-11-13 16:51:50 Re: Reducing power consumption on idle servers