Re: Data is copied twice when specifying both child and parent table in publication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-10-20 08:02:02
Message-ID: CAFiTN-seUmrMSm8Z4cNE7H2u=N0=L2OY4hCeFxRwf4YSa5zCqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 20, 2021 at 12:44 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Mon, Oct 18, 2021 at 5:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > I have not debugged it yet to find out why, but with the patch
> > > applied, the original double-publish problem that I reported
> > > (converted to just use TABLE rather than ALL TABLES IN SCHEMA) still
> > > occurs.
> > >
> >
> > Yeah, I think this is a variant of the problem being fixed by
> > Hou-San's patch. I think one possible idea to investigate is that on
> > the subscriber-side, after fetching tables, we check the already
> > subscribed tables and if the child tables already exist then we ignore
> > the parent table and vice versa. We might want to consider the case
> > where a user has toggled the "publish_via_partition_root" parameter.
> >
> > It seems both these behaviours/problems exist since commit 17b9e7f9
> > (Support adding partitioned tables to publication). Adding Amit L and
> > Peter E (people involved in this work) to know their opinion?
> >
>
> Actually, at least with the scenario I gave steps for, after looking
> at it again and debugging, I think that the behavior is understandable
> and not a bug.
> The reason is that the INSERTed data is first published though the
> partitions, since initially there is no partitioned table in the
> publication (so publish_via_partition_root=true doesn't have any
> effect). But then adding the partitioned table to the publication and
> refreshing the publication in the subscriber, the data is then
> published "using the identity and schema of the partitioned table" due
> to publish_via_partition_root=true. Note that the corresponding table
> in the subscriber may well be a non-partitioned table (or the
> partitions arranged differently) so the data does need to be
> replicated again.

I don't think this behavior is consistent, I mean for the initial sync
we will replicate the duplicate data, whereas for later streaming we
will only replicate it once. From the user POW, this behavior doesn't
look correct.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-10-20 08:59:22 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message Greg Nancarrow 2021-10-20 07:14:07 Re: Data is copied twice when specifying both child and parent table in publication