Re: Data is copied twice when specifying both child and parent table in publication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-10-20 09:33:17
Message-ID: CAJcOf-fv7tEv=N+LZo9H1fp1A7NB9wsWDDMw048XNy2fyESgnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 20, 2021 at 7:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > > Actually, at least with the scenario I gave steps for, after looking
> > > at it again and debugging, I think that the behavior is understandable
> > > and not a bug.
> > > The reason is that the INSERTed data is first published though the
> > > partitions, since initially there is no partitioned table in the
> > > publication (so publish_via_partition_root=true doesn't have any
> > > effect). But then adding the partitioned table to the publication and
> > > refreshing the publication in the subscriber, the data is then
> > > published "using the identity and schema of the partitioned table" due
> > > to publish_via_partition_root=true. Note that the corresponding table
> > > in the subscriber may well be a non-partitioned table (or the
> > > partitions arranged differently) so the data does need to be
> > > replicated again.
> >
>
> Even if the partitions are arranged differently why would the user
> expect the same data to be replicated twice?
>

It's the same data, but published in different ways because of changes
the user made to the publication.
I am not talking in general, I am specifically referring to the
scenario I gave steps for.
In the example scenario I gave, initially when the subscription was
made, the publication just explicitly included the partitions, but
publish_via_partition_root was true. So in this case it publishes
through the individual partitions (as no partitioned table is present
in the publication). Then on the publisher side, the partitioned table
was then added to the publication and then ALTER SUBSCRIPTION ...
REFRESH PUBLICATION done on the subscriber side. Now that the
partitioned table is present in the publication and
publish_via_partition_root is true, it is "published using the
identity and schema of the partitioned table rather than that of the
individual partitions that are actually changed". So the data is
replicated again.
This scenario didn't use initial table data, so initial table sync
didn't come into play (although as I previously posted, I can see a
double-publish issue on initial sync if data is put in the table prior
to subscription and partitions have been explicitly added to the
publication).

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ronan Dunklau 2021-10-20 09:40:18 Re: pg_receivewal starting position
Previous Message Amit Kapila 2021-10-20 09:17:45 Re: LogicalChanges* and LogicalSubxact* wait events are never reported