Re: Data is copied twice when specifying both child and parent table in publication

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, vignesh C <vignesh21(at)gmail(dot)com>, "Takamichi Osumi (Fujitsu)" <osumi(dot)takamichi(at)fujitsu(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2023-03-31 23:04:37
Message-ID: 3a4504df-3f8e-c21a-fff4-cb7a6d531c57@timescale.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/30/23 20:01, Peter Smith wrote:
> For example, Just imagine if logic could be made smarter to recognize
> that since there was already the 'part_def' being subscribed so it
> should NOT use the default 'copy_data=true' when the REFRESH launches
> the ancestor table 'part'...
>
> Even if that logic was implemented, I have a feeling you could *still*
> run into problems if the 'part' table was made of multiple partitions.
> I think you might get to a situation where you DO want some partition
> data copied (because you did not have it yet but now you are
> subscribing to the root you want it) while at the same time, you DON'T
> want to get duplicated data from other partitions (because you already
> knew about those ones -- like your example does).

Hm, okay. My interest here is mainly because my logical-roots proposal
generalizes the problem (and therefore makes it worse).

For what it's worth, that patchset introduces the ability for the
subscriber to sync multiple tables into one. I wonder if that could be
used somehow to help fix this problem too?

> At least, we need to check there are sufficient "BE CAREFUL" warnings
> in the documentation for scenarios like this.

Agreed. These are sharp edges.

Thanks,
--Jacob

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2023-03-31 23:05:37 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message Jeff Davis 2023-03-31 22:46:04 Re: running logical replication as the subscription owner