Re: Data is copied twice when specifying both child and parent table in publication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Data is copied twice when specifying both child and parent table in publication
Date: 2021-10-20 07:14:07
Message-ID: CAJcOf-fZTvpQ8X0ZtZbR4fCDAXmuXdSsFYvyRLmCY5tN1QDF8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 18, 2021 at 5:00 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > I have not debugged it yet to find out why, but with the patch
> > applied, the original double-publish problem that I reported
> > (converted to just use TABLE rather than ALL TABLES IN SCHEMA) still
> > occurs.
> >
>
> Yeah, I think this is a variant of the problem being fixed by
> Hou-San's patch. I think one possible idea to investigate is that on
> the subscriber-side, after fetching tables, we check the already
> subscribed tables and if the child tables already exist then we ignore
> the parent table and vice versa. We might want to consider the case
> where a user has toggled the "publish_via_partition_root" parameter.
>
> It seems both these behaviours/problems exist since commit 17b9e7f9
> (Support adding partitioned tables to publication). Adding Amit L and
> Peter E (people involved in this work) to know their opinion?
>

Actually, at least with the scenario I gave steps for, after looking
at it again and debugging, I think that the behavior is understandable
and not a bug.
The reason is that the INSERTed data is first published though the
partitions, since initially there is no partitioned table in the
publication (so publish_via_partition_root=true doesn't have any
effect). But then adding the partitioned table to the publication and
refreshing the publication in the subscriber, the data is then
published "using the identity and schema of the partitioned table" due
to publish_via_partition_root=true. Note that the corresponding table
in the subscriber may well be a non-partitioned table (or the
partitions arranged differently) so the data does need to be
replicated again.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-10-20 08:02:02 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message Sasasu 2021-10-20 07:05:14 Re: XTS cipher mode for cluster file encryption