| From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> |
| Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, YeXiu <1518981153(at)qq(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com> |
| Subject: | Re: Skipping schema changes in publication |
| Date: | 2026-01-28 05:16:07 |
| Message-ID: | CAJpy0uBfEuzYX+qjAPM+GV5duOwMNqO6fkDtsN1OzONVNR9WGQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Jan 27, 2026 at 8:25 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 23 Jan 2026 at 18:41, vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Wed, 21 Jan 2026 at 11:35, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Jan 19, 2026 at 3:08 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > > >
> > > > Approaches for Supporting EXCEPT in Partitioned Tables
> > > > ------------------------------------------------------------------------
> > > >
> > > > In an offline discussion with Peter Smith, Amit, and Shlok, we
> > > > identified several approaches for supporting EXCEPT with partitioned
> > > > tables and their partitions. I’d like to hear others’ opinions on
> > > > these approaches.
> > > >
> > > > Consider the following partition hierarchy:
> > > > tab_root
> > > > ├─ tab_part_1
> > > > │ ├─ tab_part_1_p1
> > > > │ └─ tab_part_1_p2
> > > > └─ tab_part_2
> > > > ├─ tab_part_2_p1
> > > > └─ tab_part_2_p2
> > > >
> > > >
> > > > Approach 1:
> > > > ---------------------------------
> > > > If we exclude a table, then the data in that table and all of its
> > > > partitions (i.e., the entire subtree under that table) should not be
> > > > replicated.
> > > >
> > > > For example EXCEPT (tab_part_1) skips replication of tab_part_1 and
> > > > all of its partitions.
> > > >
> > > > This behaviour remains the same with or without
> > > > publish_via_partition_root. The publish_via_partition_root flag only
> > > > affects publish_via_relid, i.e., the relation through which data is
> > > > published.
> > > >
> > > > This approach involves certain implementation challenges. For brevity,
> > > > these are documented in the attached 'Approach1_challenges' document.
> > > >
> > > > Approach 2:
> > > > ---------------------------------------------------
> > > > Assign meaning to ONLY and '*' for partition tables in the EXCEPT
> > > > list. In HEAD, ONLY and '*' do not have any meaning for partitioned
> > > > tables or partitions, and these keywords are currently ignored.
> > > >
> > > > Examples:
> > > > 1. EXCEPT (ONLY tab_part_1) skips replication of only the table
> > > > tab_part_1. Changes for tab_root, tab_part_1_p1, and tab_part_1_p2 are
> > > > still replicated.
> > > >
> > > > ii. EXCEPT (tab_part_1*) skips replication of tables tab_part_1,
> > > > tab_part_1_p1, and tab_part_1_p2
> > > >
> > > > The challenges described in Approach 1, particularly around tablesync
> > > > handling and COPY behaviour, would still need to be addressed under
> > > > this approach as well. ONLY or '*' with partitioned tables is not
> > > > supported in HEAD, supporting it specifically for ALL TABLES EXCEPT
> > > > may introduce additional confusion for users.
> > > >
> > > > Approach 3:
> > > > ----------------
> > > > Do not allow partitions to be specified in the EXCEPT clause.
> > > >
> > > > Only EXCEPT (tab_root) is supported, which excludes tab_root and all
> > > > of its partitions. Specifying EXCEPT (tab_part_1) or EXCEPT
> > > > (tab_part_1_p1) will result in an error.
> > > >
> > > > ~~
> > > >
> > > > While Approach 1 and Approach 2 offer more flexibility to the user
> > > > compared to Approach 3, they also introduce additional design
> > > > complexity which does not seem simpler to address.
> > >
> > > Thanks for explaining this, overall I like the Approach 1, and I also
> > > see the problem when publish via root is given in that case COPY FROM
> > > is executed on the root and it would be hard to exclude specific
> > > partitions.
> >
> > Regarding the above issue which is also mentioned in
> > Approach1_challenges at [1]:
> > When a publication is created with publish_via_partition_root = true
> > and a specific partition(tab_part_1_1) is excluded, the expected
> > behavior is that changes from non-excluded partitions (for example,
> > tab_part_2 and tab_part_1_2 and their descendants) are replicated,
> > while changes from the excluded partition (tab_part_1_1 and its
> > subtree) are not.
> > tab_root
> > ├── tab_part_1
> > │ ├── tab_part_1_1 (except)
> > │ │ ├── tab_part_1_1_1
> > │ │ │ └── tab_part_1_1_1_1
> > │ │ └── tab_part_1_1_2
> > │ └── tab_part_1_2
> > │ ├── tab_part_1_2_1
> > │ └── tab_part_1_2_2
> > └── tab_part_2
> >
> > In this situation, replication cannot be performed purely via the
> > partition root (tab_root), because doing so would implicitly include
> > data from the excluded child partitions.
> >
> > To address this, the publication creation should explicitly record the
> > excluded partition(tab_part_1_1) in pg_publication_rel with an
> > excluded = true flag. The publish_via_partition_root setting remains
> > stored at the publication level, as it is today. With
> > publish_via_partition_root = true, the publisher–subscriber mapping is
> > not partition-to-partition. Instead, all eligible data is mapped to
> > the subscriber’s partition root. Therefore,
> > pg_get_publication_tables() should return only the top-level root
> > table (tab_root) to the subscriber for table synchronization. During
> > initial table sync, when the tablesync worker prepares the COPY
> > command, it can query the publisher to determine the effective set of
> > tables that belong to the publication after applying the exclusion
> > rules. Based on this resolved table list, the tablesync worker can
> > construct a COPY query that unions data only from the non-excluded
> > partitions, for example:
> > COPY (
> > SELECT * FROM tab_part_1_2_1
> > UNION ALL
> > SELECT * FROM tab_part_1_2_2
> > UNION ALL
> > SELECT * FROM tab_part_2
> > )
> >
> > This ensures that only non-excluded data is copied and applied to
> > tab_root on the subscriber, while preserving the semantics of
> > publish_via_partition_root = true.
I agree with the suggested changes in tablesync. It will be good if we
can add these details in the commit-msg section of the patch. Also
please mention how increment replication is impacted (or supposed to
work) with Approach1.
> Here is a patch which has the changes to handle the same.
>
Thank You for the patch.
1)
There are certain parts of Approach 3 still present in Approach 1, as
an example:
1a)
+ For partitioned tables, only the root partitioned table may be specified
+ in <literal>EXCEPT TABLE</literal>.
1b)
+ /*
+ * Only the topmost ancestor of a partitioned table can be specified
+ * in EXCEPT TABLES clause of a FOR ALL TABLES publication. So fetch
+ * the publications excluding the topmost ancestor only.
+ */
+ GetRelationPublications(llast_oid(ancestors), NULL, &exceptpuboids);
+
1c)
+ /* Check if the partiton is part of EXCEPT list of any publication */
+ GetRelationPublications(RelationGetRelid(attachrel), NULL, &except_pubids);
+ if (except_pubids != NIL)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot attach relation \"%s\" as partition because it is
part of EXCEPT list in publication",
+ RelationGetRelationName(attachrel))));
+
Overall, please take a diff of v35 and v37 to find such parts and
please correct these and others (if any).
2)
Also I don't think if below is correct statement for Approach 1:
+ * 2. For a partition, if the topmost ancestor is part of
+ * the EXCEPT TABLE list, we don't publish it.
Even if any ancestor is part of EXECPT list (not only top most) we
should not publish that partition, isn't it?
3)
I tried a scenario and found that incremental replication is not
working correctly. Attached the failing test as Approach1_v37_fail.txt
Once these basic things are corrected, I can review further.
thanks
Shveta
| Attachment | Content-Type | Size |
|---|---|---|
| Approach1_v37_fail.txt | text/plain | 3.6 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Hayato Kuroda (Fujitsu) | 2026-01-28 05:18:15 | RE: Remove unused argument from ApplyLogicalMappingFile() |
| Previous Message | Michael Paquier | 2026-01-28 04:49:58 | Re: Extended Statistics set/restore/clear functions. |