RE: Data is copied twice when specifying both child and parent table in publication

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: Jacob Champion <jchampion(at)timescale(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-31 10:04:10
Message-ID: OSZPR01MB6310829049C935E1F8D5344FFD8F9@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 31, 2023 2:16 AM Jacob Champion <jchampion(at)timescale(dot)com> wrote:
>
> On Wed, Mar 29, 2023 at 2:00 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > Pushed.
>
> While rebasing my logical-roots patch over the top of this, I ran into
> another situation where mixed viaroot settings can duplicate data. The
> key idea is to subscribe to two publications with mixed settings, as
> before, and add a partition root that's already been replicated with
> viaroot=false to the other publication with viaroot=true.
>
> pub=# CREATE TABLE part (a int) PARTITION BY RANGE (a);
> pub=# CREATE PUBLICATION pub_all FOR ALL TABLES;
> pub=# CREATE PUBLICATION pub_other FOR TABLE other WITH
> (publish_via_partition_root);
> -- populate with data, then switch to subscription side
> sub=# CREATE SUBSCRIPTION sub CONNECTION ... PUBLICATION pub_all,
> pub_other;
> -- switch back to publication
> pub=# ALTER PUBLICATION pub_other ADD TABLE part;
> -- and back to subscription
> sub=# ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
> -- data is now duplicated
>
> (Standalone reproduction attached.)
>
> This is similar to what happens if you alter the
> publish_via_partition_root setting for an existing publication, but
> I'd argue it's easier to hit by accident. Is this part of the same
> class of bugs, or is it different (or even expected) behavior?
>

I noticed that a similar problem has been discussed in this thread, see [1] [2]
[3] [4]. It seems complicated to fix it if we want to automatically skip tables
that have been synchronized previously by code, and this may overkill in some
cases (e.g. The target table in subscriber is not a partitioned table, and the
user want to synchronize all data in the partitioned table from the publisher).
Besides, it seems not a common case. So I'm not sure we should fix it. Maybe we
can just add some documentation for it as Peter mentioned.

[1] https://www.postgresql.org/message-id/CAJcOf-eQR_%3Dq0f4ZVHd342QdLvBd_995peSr4xCU05hrS3TeTg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com (the second issue in it)
[3] https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[4] https://www.postgresql.org/message-id/CAA4eK1%2BNWreG%3D2sKiMz8vFzTsFhEHCjgQMyAu6zj3sdLmcheYg%40mail.gmail.com

Regards,
Shi Yu

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-03-31 10:11:56 Re: ICU locale validation / canonicalization
Previous Message Jeff Davis 2023-03-31 09:50:01 Re: Minimal logical decoding on standbys