Re: Data is copied twice when specifying both child and parent table in publication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-10-21 09:45:35
Message-ID: CAA4eK1L6L_c8Fzn0PUnmWpy373qKGCzmoqhct1TrbKsuzJgW0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 20, 2021 at 7:11 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Wed, Oct 20, 2021 at 9:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > I don't see why data need to be replicated again even in that case.
> > Can you see any such duplicate data replicated for non-partitioned
> > tables?
> >
>
> If my example is slightly modified to use the same-named tables on the
> subscriber side, but without partitioning, i.e.:
>
> PUB:
>
> CREATE SCHEMA sch;
> CREATE SCHEMA sch1;
> CREATE TABLE sch.sale (sale_date date not null, country_code text,
> product_sku text, units integer) PARTITION BY RANGE (sale_date);
> CREATE TABLE sch1.sale_201901 PARTITION OF sch.sale FOR VALUES FROM
> ('2019-01-01') TO ('2019-02-01');
> CREATE TABLE sch1.sale_201902 PARTITION OF sch.sale FOR VALUES FROM
> ('2019-02-01') TO ('2019-03-01');
>
>
> SUB:
>
> CREATE SCHEMA sch;
> CREATE SCHEMA sch1;
> CREATE TABLE sch.sale (sale_date date not null, country_code text,
> product_sku text, units integer);
> CREATE TABLE sch1.sale_201901 (sale_date date not null, country_code
> text, product_sku text, units integer);
> CREATE TABLE sch1.sale_201902 (sale_date date not null, country_code
> text, product_sku text, units integer);
>
> then the INSERTed data on the publisher side gets replicated to the
> subscriber's "sch1.sale_201901" and "sch1.sale_201902" tables (only),
> depending on the date values.
> Now if the partitioned table is then added to the publication and
> ALTER SUBSCRIPTION ... REFRESH PUBLICATION done by the subscriber,
> then the current functionality is that the existing sch.sale data is
> replicated (only) to the subscriber's "sch.sale" table (even though
> data had been replicated previously to the "sch1.sale_201901" and
> "sch1.sale_201902" tables, only).
> So, just to be clear, you think that this current functionality isn't
> correct (i.e. no data should be replicated on the REFRESH in this
> case)?
>

Right, I don't think it is correct because it will behave differently
when the tables on the subscriber are partitioned. Also, the idea I
speculated in one of my above emails should be able to deal with this
case.

> I think it's debatable because here copy_data=true and sch.sale was
> not a previously-subscribed table (so pre-existing data in that table
> should be copied, in accordance with the current documentation).
>

What about the partition (child) table? In this case, the same data
will be present in two tables sch.sale and sch1.sale_201901 after you
have refreshed the publication, and then any future insertions will
only be inserted into parent table sch.sale in this case which doesn't
sound consistent. The bigger problem is that it will lead to duplicate
data when tables are partitioned. I think if the user really wants to
do in a way you are describing, there is no need to keep sub-tables
(*_201901 and *__201902). I understand that it depends on the use case
but we should also behave sanely when tables/partitions are created in
the same way in both publisher and subscriber which I guess will most
likely be the case.

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Nancarrow 2021-10-21 09:59:02 Re: Added schema level support for publication.
Previous Message wenjing 2021-10-21 09:25:31 Re: [Proposal] Global temporary tables