Re: Report replica identity in pg_publication_tables

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Report replica identity in pg_publication_tables
Date: 2025-11-14 13:31:51
Message-ID: CAExHW5vuPfeXve+mVpDnEzQagCuwkLvkZPzMHS17P4BpiGK7hg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 4, 2025 at 8:16 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Wed, Jul 2, 2025 at 1:46 PM Zhijie Hou (Fujitsu)
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> >
> > I'm concerned about whether we can correctly display replica identity in the
> > view for partitioned tables.
> >
> > In the case of partitioned tables, we display only the root table in the view
> > if publish_via_partition_root is enabled for the publication. However, the
> > logged information depends on the replica identity (RI) of each child
> > partition. Displaying only the root table's RI could be a bit confusing since
> > each partition might have different RI settings, making it challenging to
> > represent all of them in the entry for the root table.
> >
>
> I see following possibilities:
> 1. If a publication has publish_via_partition_root true, then for all
> the partitioned tables we show "partitioned" or "partition root" in
> replica identity column with a note in documentation that it means
> that the replica identity of the actual leaf partition would be used
> instead of replica identity of the partitioned table. And then it will
> be for the user to find out all the replica identities.
> 2. If a publication has publish_via_partition_root true, then for all
> the partitioned tables we show a set of distinct replica identities of
> the leaf partitions. That way, if all the leaf partitions have the
> same replica identity setting, that will be reported. Otherwise, the
> user will be able to know all the possible replica identity settings
> that may be used. Again, documentation should clarify what this value
> means for a partitioned table. But then somehow we need to indicate
> whether a given table is partitioned or not.
> 3. We report "partitioned" or "partition root" followed by all the
> distinct replica identities, thus letting the user know that the table
> is partitioned and what replica identity settings its partitions have.
> And document.
>
> Since the targeted use of this column is to know the replica
> identities of the tables published by a given publication so that they
> can guess what might be WAL logged, these options seem ok along with
> the clarifying documentation.
>
> What do you think?

I researched this a bit more and then I found some inconsistency
between the code and the documentation

At https://www.postgresql.org/docs/current/sql-createpublication.html,
we mention that changes to a partition table will use identity of a
partititioned table when publish_via_partition_root is true.

--- quote documentation
This parameter determines whether changes in a partitioned table (or
on its partitions) contained in the publication will be published
using the identity and schema of the partitioned table rather than
that of the individual partitions that are actually changed; the
latter is the default. Enabling this allows the changes to be
replicated into a non-partitioned table or a partitioned table
consisting of a different set of partitions.

There can be a case where a subscription combines multiple
publications. If a partitioned table is published by any subscribed
publications which set publish_via_partition_root = true, changes on
this partitioned table (or on its partitions) will be published using
the identity and schema of this partitioned table rather than that of
the individual partitions.
--- unquote documentation

Which identity are we talking about here? If that's replica identity,
the documentation is wrong since code says otherwise
In pub_rf_contains_invalid_column()

/*
* For a partition, if pubviaroot is true, find the topmost ancestor that
* is published via this publication as we need to use its row filter
* expression to filter the partition's changes.
*
* Note that even though the row filter used is for an ancestor, the
* REPLICA IDENTITY used will be for the actual child table.
*/

Let me revise possible solutions so that this view can report replica
identity even in case of publish_via_partition_root

1. If a publication has publish_via_partition_root true, then for all
the partitioned tables we show replica identity of the root partition
suffixed by (mixed). E.g. full (mixed). Add a note in documentation
explaining the meaning of suffix "mixed".

2. If a publication has publish_via_partition_root true, then for all
the partitioned tables we show a set of distinct replica identities of
its partitions and also the replica identity of the root. If all the
partitions have the same replica identity setting, only one replica
identity type will be reported. Otherwise, the user will be able to
know all the possible replica identity settings that may be used. Add
a note to the documentation explaining the meaning of this value for a
partitioned table.

3. Let the view report all the partitions as well as the partition
root. Against each relation, report its replica identity. Expand the
pg_publication_tables view to also report the relation via which the
changes are published (under column name "publish_via"). When
publish_via_partition_root is true, publish_via reports the top
partitioned relation. I think that gives full information about tables
being published through the publication.

What do you think?

--
Best Wishes,
Ashutosh Bapat

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-11-14 13:52:08 Re: backend/nodes cleanup: Move loop variables definitions into for statement
Previous Message Ivan Kovmir 2025-11-14 13:15:10 Re: Incorrect checksum in control file with pg_rewind test