Re: Data is copied twice when specifying both child and parent table in publication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-10-20 09:00:18
Message-ID: CAJcOf-fHq5Mca2sf7MqckwkXGLfjqiKboKsDNnywC-jnvM_BBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 20, 2021 at 7:02 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> > Actually, at least with the scenario I gave steps for, after looking
> > at it again and debugging, I think that the behavior is understandable
> > and not a bug.
> > The reason is that the INSERTed data is first published though the
> > partitions, since initially there is no partitioned table in the
> > publication (so publish_via_partition_root=true doesn't have any
> > effect). But then adding the partitioned table to the publication and
> > refreshing the publication in the subscriber, the data is then
> > published "using the identity and schema of the partitioned table" due
> > to publish_via_partition_root=true. Note that the corresponding table
> > in the subscriber may well be a non-partitioned table (or the
> > partitions arranged differently) so the data does need to be
> > replicated again.
>
> I don't think this behavior is consistent, I mean for the initial sync
> we will replicate the duplicate data, whereas for later streaming we
> will only replicate it once. From the user POW, this behavior doesn't
> look correct.
>

The scenario I gave steps for didn't have any table data when the
subscription was made, so the initial sync did not replicate any data.
I was referring to the double-publish that occurs when
publish_via_partition_root=true and then the partitioned table is
added to the publication and the subscriber does ALTER SUBSCRIPTION
... REFRESH PUBLICATION.
If I modify my example to include both the partitioned table and
(explicitly) its child partitions in the publication, and insert some
data on the publisher side prior to the subscription, then I am seeing
duplicate data on the initial sync on the subscriber side, and I would
agree that this doesn't seem correct.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-10-20 09:17:45 Re: LogicalChanges* and LogicalSubxact* wait events are never reported
Previous Message Amit Kapila 2021-10-20 08:59:22 Re: Data is copied twice when specifying both child and parent table in publication