From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com> |
Cc: | "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: Data is copied twice when specifying both child and parent table in publication |
Date: | 2022-04-21 09:41:00 |
Message-ID: | CAA4eK1LWYqCyDM1_6wZETS0fZ79u+ciU8MAB3eu+R9Keg96rUg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 19, 2022 at 2:23 PM shiy(dot)fnst(at)fujitsu(dot)com
<shiy(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Tue, Apr 19, 2022 3:05 PM houzj(dot)fnst(at)fujitsu(dot)com <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > One suggestion is that can we simplify the code by moving the logic of checking
> > the ancestor into the SQL ?. For example, we could filter the outpout of
> > pg_publication_tables by adding A WHERE clause which checks whether the table
> > is a partition and if its ancestor is also in the output. I think we can also
> > filter the needless partition in this approach.
> >
>
> I agreed with you and I tried to fix this problem in a simpler way. What we want
> is to exclude the partitioned table whose ancestor is also need to be
> replicated, so how about implementing that by using the following SQL when
> getting the table list from publisher?
>
> SELECT DISTINCT ns.nspname, c.relname
> FROM pg_catalog.pg_publication_tables t
> JOIN pg_catalog.pg_namespace ns ON ns.nspname = t.schemaname
> JOIN pg_catalog.pg_class c ON c.relname = t.tablename AND c.relnamespace = ns.oid
> WHERE t.pubname IN ('p0','p2')
> AND (c.relispartition IS FALSE OR NOT EXISTS (SELECT 1 FROM pg_partition_ancestors(c.oid)
> WHERE relid IN ( SELECT DISTINCT (schemaname||'.'||tablename)::regclass::oid
> FROM pg_catalog.pg_publication_tables t
> WHERE t.pubname IN ('p0','p2') ) AND relid != c.oid));
>
> Please find the attached patch which used this approach, I also merged the test
> in Wang's patch into it.
>
I think this will work but do we need "... relid != c.oid" at the end
of the query? If so, why? Please use an alias for
pg_partition_ancestors to make the statement understandable.
Now, this solution will work but I find this query a bit complex and
will add some overhead as we are calling pg_publication_tables
multiple times. So, I was wondering if we can have a new function
pg_get_publication_tables which takes multiple publications as input
and return the list of qualified tables? I think for back branches we
need something on the lines of what you have proposed but for HEAD we
can have a better solution.
IIRC, the column list and row filter also have some issues exactly due
to this reason, so, I would like those cases to be also mentioned here
and probably include the tests for them in the patch for HEAD.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | wangw.fnst@fujitsu.com | 2022-04-21 09:50:57 | RE: Logical replication timeout problem |
Previous Message | Niyas Sait | 2022-04-21 09:21:04 | Re: [PATCH] Add native windows on arm64 support |