Re: pg_get_publication_tables() output duplicate relid

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_get_publication_tables() output duplicate relid
Date: 2021-11-18 04:53:40
Message-ID: CAA4eK1L+DB9Zt8se0oFnhtgRMGaOXv-6n3A70NJVL4NcV3Qkxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 17, 2021 at 11:39 AM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Tue, Nov 16, 2021 at 10:27 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Mon, Nov 15, 2021 at 7:12 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > > > On Mon, Nov 15, 2021 at 1:48 PM houzj(dot)fnst(at)fujitsu(dot)com
> > > > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > > > create table tbl1 (a int) partition by range (a);
> > > > > create table tbl1_part1 partition of tbl1 for values from (1) to (10);
> > > > > create table tbl1_part2 partition of tbl1 for values from (10) to (20);
> > > > > create publication pub for table
> > > > > tbl1, tbl1_part1 with (publish_via_partition_root=false);
> > >
> > > In the name of consistency, I think this situation should be an error --
> > > I mean, if we detect that the user is trying to add both the partitioned
> > > table *and* its partition, then all manner of things are possibly going
> > > to go wrong in some way, so my inclination is to avoid it altogether.
> > >
> > > Is there any reason to allow that?
> >
> > I think it could provide flexibility to users to later change
> > "publish_via_partition_root" option. Because when that option is
> > false, we use individual partitions schema to send changes and when it
> > is true, we use root table's schema to send changes. Added Amit L. to
> > know if he has any thoughts on this matter as he was the author of
> > this work?
>
> FWIW, I'm not sure that adding an error in this path, that is, when a
> user adds both a partitioned table and its partitions to a
> publication, helps much. As for the safety of allowing it, if you
> look at get_rel_sync_entry(), which handles much of the complexity of
> determining whether to publish a relation's changes and the schema to
> use when doing so, you may be able to see that a partition being added
> duplicatively is harmless, modulo any undiscovered bugs. At least as
> far as the post-initial-sync replication functionality is concerned.
>
> What IS problematic is what a subscriber sees in the
> pg_publication_tables view and the problem occurs only in the initial
> sync phase, where the partition is synced duplicatively because of
> being found in the view along with the parent in this case, that is,
> when publish_via_partiiton_root is true. I was saying on the other
> thread [1] that we should leave it up to the subscriber to decide what
> to do when the view (duplicatively) returns both the parent and the
> partition, citing the use case that a subscriber may want to replicate
> the parent and the partition as independent tables. Though I now tend
> to agree with Amit K that that may be such a meaningful and all that
> common use case, and the implementation on the subscriber side would
> have to be unnecessarily complex.
>
> So maybe it makes sense to just do what has been proposed --
> de-duplicate partitions out of the pg_publication_tables view,
>

AFAICU, there are actually two problems related to
pg_publication_tables view that are being discussed: (a) when
'publish_via_partition_root' is true then it returns both parent and
child tables (provided both are part of publication), this can lead to
duplicate data on the subscriber; the fix for this is being discussed
in thread [1]. (b) when 'publish_via_partition_root' is false, then it
returns duplicate entries for child tables (provided both are part of
publication), this problem is being discussed in this thread.

As per the proposed patches, it seems both need a separate fix, we can
fix them together if we want but I am not sure. Is your understanding
the same?

> unless
> we know of a bigger problem that requires us to hack the subscriber
> side of things too.
>

There is yet another issue that might need subscriber side change. See
the second issue summarized by Hou-San in the email[2]. I feel it is
better to tackle that separately.

> Actually, I came to know of one such problem
> while thinking about this today: when you ATTACH a partition to a
> table that is present in a publish_via_partition_root=true
> publication, it doesn't get copied via the initial sync even though
> subsequent replication works just fine. The reason for that is that
> the subscriber only syncs the partitions that are known at the time
> when the parent is first synced, that too via the parent (as SELECT
> <columns..> FROM parent), and then marks the parent as sync-done.
> Refreshing the subscription after ATTACHing doesn't help, because the
> subscriber can't see any partitions to begin with.
>

Sorry, it is not clear to me how this can lead to a problem. Even if
we just have the root table in the subscriber at the time of sync
later shouldn't copy get the newly attached partition's data and
replicate it to the required partition for
"publish_via_partition_root=true" case? Anyway, if this is a problem
we need to figure the solution for this separately.

[1]: https://www.postgresql.org/message-id/CA%2BHiwqHnDHcT4OOcga9rDFyc7TvDrpN5xFH9J2pyHQo9ptvjmQ%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/OS0PR01MB5716C756312959F293A822C794869%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-11-18 04:57:51 Re: pg_upgrade test for binary compatibility of core data types
Previous Message Fujii Masao 2021-11-18 04:47:50 Re: Slow client can delay replication despite max_standby_streaming_delay set