| From: | vignesh C <vignesh21(at)gmail(dot)com> |
|---|---|
| To: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, YeXiu <1518981153(at)qq(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Skipping schema changes in publication |
| Date: | 2026-01-23 13:11:09 |
| Message-ID: | CALDaNm2x3fR+AEji0ZruTdss-4WDatraXKs1QA44eVnsBmbUiA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, 21 Jan 2026 at 11:35, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Mon, Jan 19, 2026 at 3:08 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > Approaches for Supporting EXCEPT in Partitioned Tables
> > ------------------------------------------------------------------------
> >
> > In an offline discussion with Peter Smith, Amit, and Shlok, we
> > identified several approaches for supporting EXCEPT with partitioned
> > tables and their partitions. I’d like to hear others’ opinions on
> > these approaches.
> >
> > Consider the following partition hierarchy:
> > tab_root
> > ├─ tab_part_1
> > │ ├─ tab_part_1_p1
> > │ └─ tab_part_1_p2
> > └─ tab_part_2
> > ├─ tab_part_2_p1
> > └─ tab_part_2_p2
> >
> >
> > Approach 1:
> > ---------------------------------
> > If we exclude a table, then the data in that table and all of its
> > partitions (i.e., the entire subtree under that table) should not be
> > replicated.
> >
> > For example EXCEPT (tab_part_1) skips replication of tab_part_1 and
> > all of its partitions.
> >
> > This behaviour remains the same with or without
> > publish_via_partition_root. The publish_via_partition_root flag only
> > affects publish_via_relid, i.e., the relation through which data is
> > published.
> >
> > This approach involves certain implementation challenges. For brevity,
> > these are documented in the attached 'Approach1_challenges' document.
> >
> > Approach 2:
> > ---------------------------------------------------
> > Assign meaning to ONLY and '*' for partition tables in the EXCEPT
> > list. In HEAD, ONLY and '*' do not have any meaning for partitioned
> > tables or partitions, and these keywords are currently ignored.
> >
> > Examples:
> > 1. EXCEPT (ONLY tab_part_1) skips replication of only the table
> > tab_part_1. Changes for tab_root, tab_part_1_p1, and tab_part_1_p2 are
> > still replicated.
> >
> > ii. EXCEPT (tab_part_1*) skips replication of tables tab_part_1,
> > tab_part_1_p1, and tab_part_1_p2
> >
> > The challenges described in Approach 1, particularly around tablesync
> > handling and COPY behaviour, would still need to be addressed under
> > this approach as well. ONLY or '*' with partitioned tables is not
> > supported in HEAD, supporting it specifically for ALL TABLES EXCEPT
> > may introduce additional confusion for users.
> >
> > Approach 3:
> > ----------------
> > Do not allow partitions to be specified in the EXCEPT clause.
> >
> > Only EXCEPT (tab_root) is supported, which excludes tab_root and all
> > of its partitions. Specifying EXCEPT (tab_part_1) or EXCEPT
> > (tab_part_1_p1) will result in an error.
> >
> > ~~
> >
> > While Approach 1 and Approach 2 offer more flexibility to the user
> > compared to Approach 3, they also introduce additional design
> > complexity which does not seem simpler to address.
>
> Thanks for explaining this, overall I like the Approach 1, and I also
> see the problem when publish via root is given in that case COPY FROM
> is executed on the root and it would be hard to exclude specific
> partitions.
Regarding the above issue which is also mentioned in
Approach1_challenges at [1]:
When a publication is created with publish_via_partition_root = true
and a specific partition(tab_part_1_1) is excluded, the expected
behavior is that changes from non-excluded partitions (for example,
tab_part_2 and tab_part_1_2 and their descendants) are replicated,
while changes from the excluded partition (tab_part_1_1 and its
subtree) are not.
tab_root
├── tab_part_1
│ ├── tab_part_1_1 (except)
│ │ ├── tab_part_1_1_1
│ │ │ └── tab_part_1_1_1_1
│ │ └── tab_part_1_1_2
│ └── tab_part_1_2
│ ├── tab_part_1_2_1
│ └── tab_part_1_2_2
└── tab_part_2
In this situation, replication cannot be performed purely via the
partition root (tab_root), because doing so would implicitly include
data from the excluded child partitions.
To address this, the publication creation should explicitly record the
excluded partition(tab_part_1_1) in pg_publication_rel with an
excluded = true flag. The publish_via_partition_root setting remains
stored at the publication level, as it is today. With
publish_via_partition_root = true, the publisher–subscriber mapping is
not partition-to-partition. Instead, all eligible data is mapped to
the subscriber’s partition root. Therefore,
pg_get_publication_tables() should return only the top-level root
table (tab_root) to the subscriber for table synchronization. During
initial table sync, when the tablesync worker prepares the COPY
command, it can query the publisher to determine the effective set of
tables that belong to the publication after applying the exclusion
rules. Based on this resolved table list, the tablesync worker can
construct a COPY query that unions data only from the non-excluded
partitions, for example:
COPY (
SELECT * FROM tab_part_1_2_1
UNION ALL
SELECT * FROM tab_part_1_2_2
UNION ALL
SELECT * FROM tab_part_2
)
This ensures that only non-excluded data is copied and applied to
tab_root on the subscriber, while preserving the semantics of
publish_via_partition_root = true.
Thoughts?
Regards,
Vignesh
| From | Date | Subject | |
|---|---|---|---|
| Next Message | tushar | 2026-01-23 13:36:47 | Re: Non-text mode for pg_dumpall |
| Previous Message | Heikki Linnakangas | 2026-01-23 12:30:15 | Re: Is abort() still needed in WalSndShutdown()? |