Re: Skipping schema changes in publication

From: Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, YeXiu <1518981153(at)qq(dot)com>, Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Skipping schema changes in publication
Date: 2025-07-19 10:44:03
Message-ID: CANhcyEW2LK4diNeCG862DE40yQoV3VAgf59kXUq2TuR8fnw5vQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 30 Jun 2025 at 11:37, Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi Shlok.
>
> Some review comments for v15-0003.
>
> ======
> doc/src/sgml/catalogs.sgml
>
> 1.
> <para>
> - True if the relation must be excluded
> + True if the column list or relation must be excluded from publication.
> + If a column list is specified in <literal>prattrs</literal>, then
> + exclude only those columns. If <literal>prattrs</literal> is NULL,
> + then exclude the entire relation.
> </para></entry>
>
> I noticed other fields on this page say "null" instead of "NULL". It
> seems like "null" is more conventional.
>
Fixed

> ======
> doc/src/sgml/logical-replication.sgml
>
> 2.
> <para>
> If no column list is specified, any columns added to the table later are
> automatically replicated. This means that having a column list which names
> - all columns is not the same as having no column list at all.
> + all columns is not the same as having no column list at all.
> Similarly, if an
> + column list is specified with EXCEPT, any columns added to the table later
> + are also replicated automatically.
> </para>
>
> 2a.
> CURRENTLY
> If no column list or a column list with EXCEPT is specified, any
> columns added to the table later are automatically replicated. This
> means that having a column list which names all columns is not the
> same as having no column list at all. If an column list is specified,
> any columns added to the table later are automatically replicated.
>
> ~
>
> That still doesn't quite make sense. I think instead of saying "This
> means..." it needs to say something a bit like below:
>
> However, a normal column list (without EXCEPT) only
> specified columns and no more. Therefore, having a column list that
> names all columns is not the same as having no column list at all, as
> more columns may be added to the table later.
>
Fixed

> ~
>
> 2b.
> And the final sentence "If an column list..." looks like a cut/paste error (??)
>
Yes it was a mistake.

> ~
>
> 2c.
> Maybe here EXCEPT should be written as <literal>EXCEPT</literal>
>
Fixed.

> ~~~
>
> 2.5A.
> The description about generated columns still says this:
>
> CURRENT:
> Generated columns can also be specified in a column list. This allows
> generated columns to be published, regardless of the publication
> parameter publish_generated_columns. See Section 29.6 for details.
>
> ~
>
> But I don't think it is quite correct. IMO gencols behaviour is much
> more subtle...
>
> e.g.
>
> a) Normal collist - these named cols are published REGARDLESS of the
> 'publish_generated_cols' parameter (same as before)
>
> b) EXCEPT collist - you can specify gencols in the list REGARDLESS of
> the 'publish_generated_cols' parameter, because since they are named
> as "except" then they will not be published anyhow....
>
> c) BUT for EXCEPT collist case, I think any gencols that are *not*
> covered by that EXCEPT collist should follow the rules according to
> the 'publish_generated_cols' parameter.
>
> So, it is much more tricky than the docs currently say:
>
Modified the documentation

> Also
>
> 2.5B.
> - The text says "See Section 29.6 for details," but there are no
> examples of these combinations (e.g. EXCEPT collist and diff parameter
> setting)
>
Added documentation.

> 2.5C,
> - The regression tests also need to be more complex to cover these
>
Added tests related to these

> 2.5D.
> - You might need to add something in the CREATE PUBLICATION "NOTES"
> section after all -- even if it just refers to here.
>
Added documentation

> ~~~
>
> 3.
> <para>
> Create a publication <literal>p1</literal>. A column list is defined for
> - table <literal>t1</literal> to reduce the number of columns that will be
> - replicated. Notice that the order of column names in the column list does
> - not matter.
> + table <literal>t1</literal>, and another column list is defined for table
> + <literal>t2</literal> using the EXCEPT clause to reduce the number of
> + columns that will be replicated. Note that the order of column names in
> + the column lists does not matter.
> <programlisting>
> -/* pub # */ CREATE PUBLICATION p1 FOR TABLE t1 (id, b, a, d);
> +/* pub # */ CREATE PUBLICATION p1 FOR TABLE t1 (id, b, a, d), t2 EXCEPT (d, a);
> </programlisting></para>
>
> Maybe here EXCEPT should be written as <literal>EXCEPT</literal>
>
Fixed

> ======
> doc/src/sgml/ref/create_publication.sgml
>
> 4.
> <para>
> - When a column list is specified, only the named columns are replicated.
> - The column list can contain stored generated columns as well. If the
> - column list is omitted, the publication will replicate all non-generated
> - columns (including any added in the future) by default. Stored generated
> - columns can also be replicated if
> <literal>publish_generated_columns</literal>
> - is set to <literal>stored</literal>. Specifying a column list has no
> - effect on <literal>TRUNCATE</literal> commands. See
> + When a column list without EXCEPT is specified, only the named
> columns are
> + replicated. The column list can contain stored generated columns as well.
> + If the column list is omitted, the publication will replicate
> + all non-generated columns (including any added in the future) by default.
> + Stored generated columns can also be replicated if
> + <literal>publish_generated_columns</literal> is set to
> + <literal>stored</literal>. Specifying a column list has no effect on
> + <literal>TRUNCATE</literal> commands. See
> <xref linkend="logical-replication-col-lists"/> for details about column
> lists.
> </para>
>
> Maybe here EXCEPT should be written as <literal>EXCEPT</literal>
>
Fixed
> ~~~
>
> 5.
> + <para>
> + When a column list is specified with EXCEPT, the named columns are not
> + replicated. Specifying a column list has no effect on
> + <literal>TRUNCATE</literal> commands.
> + </para>
>
> Maybe here EXCEPT should be written as <literal>EXCEPT</literal>.
>
Fixed

> ** Note all the extra subtleties that I mentioned in the review
> comment #2.5 above --- e.g. IMO any *un-listed* gencols still should
> follow the parameter rules.
>
> ~~~
>
> 6.
> <para>
> Any column list must include the <literal>REPLICA IDENTITY</literal> columns
> - in order for <command>UPDATE</command> or <command>DELETE</command>
> - operations to be published. There are no column list restrictions if the
> - publication publishes only <command>INSERT</command> operations.
> + and any column list specified with EXCEPT must not include the
> + <literal>REPLICA IDENTITY</literal> columns in order for
> + <command>UPDATE</command> or <command>DELETE</command> operations to be
> + published. There are no column list restrictions if the
> publication publishes
> + only <command>INSERT</command> operations.
> </para>
>
> 6a.
> CURRENT:
> Any column list must include the REPLICA IDENTITY columns, and any
> column list specified with EXCEPT must not include the REPLICA
> IDENTITY columns in order for UPDATE or DELETE operations to be
> published.
>
> ~
>
> I felt that might be better expressed the other way around. Also, it
> might be better to say "not name" instead of "not include" because
> EXCEPT + include seemed a bit contrary.
>
>
> SUGGESTION (maybe like this)
> In order for UPDATE or DELETE operations to work, all the REPLICA
> IDENTITY columns must be published. So, any column list must name all
> REPLICA IDENTITY columns, and any EXCEPT column list must not name any
> REPLICA IDENTITY columns.
>
Fixed

> ~~
>
> 6b.
> Maybe here EXCEPT should be written as <literal>EXCEPT</literal>
>
Fixed

> ======
> src/backend/catalog/pg_publication.c
>
> check_and_fetch_column_list:
>
> 7.
> + /* Lookup the except attribute */
> + cfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, cftuple,
> + Anum_pg_publication_rel_prexcept, &isnull);
> +
> + if (!isnull)
> + {
> + Assert(!pub->alltables);
> + *except_columns = DatumGetBool(cfdatum);
> + }
> +
>
> I felt it would be safer to also assign *except_columns = false;
> up-front so the caller could be sure this flag was meaningful on
> return.
>
Fixed

> ~~~
>
> pub_form_cols_map:
>
> 8.
> Maybe use snake case like for other params, so /excepcols/except_cols/
>
Fixed

> ~~~
>
> pg_get_publication_tables:
>
> 9.
>
> I felt all the logic in this function maybe can be simpler:
>
> e.g. If you just have "Bitmapset *except_columns = NULL;" then null
> nmeans there is no except columns; otherwise there is. This means you
> don't need a separate 'bool except_column' variable.
>
> e.g. Assign the Bitmapset *except_columns after you already have the
> values[2], instead of doing it later.
>
> e.g. The skip code if (except_columns && bms_is_member(att->attnum,
> columns)) could just check the list member, I think, without the
> additional bool.
>
> ~~~
>
Fixed

> 10.
> + /*
> + * We fetch pubtuple if publication is not FOR ALL TABLES and not
> + * FOR TABLES IN SCHEMA. So if prexcept is true, it indicate that
> + * prattrs contains columns to be excluded for replication.
> + */
> + if (!isnull)
> + except_columns = DatumGetBool(exceptDatum);
>
>
> /indicate/indicates/
>
Fixed

> ======
> src/backend/parser/gram.y
>
> 11.
> + | TABLE relation_expr EXCEPT opt_except_column_list OptWhereClause
> + {
> + $$ = makeNode(PublicationObjSpec);
> + $$->pubobjtype = PUBLICATIONOBJ_TABLE;
> + $$->pubtable = makeNode(PublicationTable);
> + $$->pubtable->relation = $2;
> + $$->pubtable->columns = $4;
> + $$->pubtable->whereClause = $5;
> + $$->pubtable->except = true;
> + $$->location = @1;
> + }
>
> I wasn't expecting you would need another 'opt_except_column_list' and
> all the code duplication that causes. AFAIK, the syntax is identical
> for 'opt_column_list' apart from the preceding EXCEPT so I thought all
> you need is to allow the 'opt_column_list' to have an optional EXCEPT
> qualifier.
>
The main reason I used a separate 'opt_except_column_list' is because
'opt_column_list' can also be NULL. But the column list specified with
EXCEPT not be NULL. So, 'opt_except_column_list' is defined such that
it cannot be null.

> ======
> src/backend/replication/pgoutput/pgoutput.c
>
> 12.
> +
> + /*
> + * Indicates whether no columns are published for a given relation. With
> + * the introduction of the EXCEPT clause in column lists, it is now
> + * possible to define a publication that excludes all columns of a table.
> + * However, the 'columns' attribute cannot represent this case, since a
> + * NULL value implies that all columns are published. To distinguish this
> + * scenario, the 'no_cols_published' flag is introduced.
> + */
> + bool no_cols_published;
> } RelationSyncEntry;
>
> But, what about when Bitmapset *columns is not null, but has no bits
> set -- doesn't that mean the same as "no columns"?
>
I think this is possible. A bitmapset which has no set bit is NULL. I
saw following comment in bitmapset.c
"By convention, we always represent a set with
* the minimum possible number of words, i.e, there are never any trailing
* zero words. Enforcing this requires that an empty set is represented as
* NULL. Because an empty Bitmapset is represented as NULL, a non-NULL
* Bitmapset always has at least 1 Bitmapword."

> ======
> src/include/catalog/pg_publication.h
>
> 13.
> extern Bitmapset *pub_form_cols_map(Relation relation,
> - PublishGencolsType include_gencols_type);
> + PublishGencolsType include_gencols_type,
> + Bitmapset *exceptcols);
>
> Maybe snake-case like the other params: /exceptcols/except_cols/
>
Fixed

> ======
> src/test/regress/sql/publication.sql
>
> 14.
> +-- Verify that publication is created with EXCEPT
> +CREATE PUBLICATION testpub_except FOR TABLE pub_test_except1,
> pub_sch1.pub_test_except2 EXCEPT (b, c);
> +SELECT * FROM pg_publication_tables WHERE pubname = 'testpub_except';
> +
>
> I think tests should also use psql \dRp+ commands in places to show
> that the "describe" stuff is working correctly.
>
> ~~~
Fixed

>
> 15.
> +-- Check for invalid cases
> +CREATE PUBLICATION testpub_except2 FOR TABLES IN SCHEMA pub_sch1,
> TABLE pub_test_except1 EXCEPT (b, c);
> +CREATE PUBLICATION testpub_except2 FOR TABLE pub_test_except1 EXCEPT;
>
> Should explain more about what you are testing here:
> a) cannot use EXCEPT col-lists combined with TABLES IN SCHEMA
> b) syntax error EXCEPT without a col-list
>
> ~~~
fixed

>
> 16.
> +-- Verify that publication can be altered with EXCEPT
> +ALTER PUBLICATION testpub_except SET TABLE pub_test_except1 EXCEPT
> (a, b), pub_sch1.pub_test_except2;
> +SELECT * FROM pg_publication_tables WHERE pubname = 'testpub_except';
>
> The comment is a bit misleading because there are many kinds of
> "alter". Maybe say more like
> Verify ok - ALTER PUBLICATION ... SET ... EXCEPT (col-list)
>
> ~~~
Fixed

>
> 17.
> +-- Verify ALTER PUBLICATION ... DROP
> +ALTER PUBLICATION testpub_except DROP TABLE pub_test_except1 EXCEPT (a, b);
> +ALTER PUBLICATION testpub_except DROP TABLE pub_test_except1;
>
> Should explain more:
> +-- Verify fails - ALTER PUBLICATION ... DROP ... EXCEPT (col-list)
> +-- Verify ok - ALTER PUBLICATION ... DROP ...
>
> ~~~
Fixed

>
> 18.
> +ALTER PUBLICATION testpub_except ADD TABLE pub_test_except1 EXCEPT (c, d);
> +SELECT * FROM pg_publication_tables WHERE pubname = 'testpub_except';
>
> Missing comment:
> +-- Verify ok - ALTER PUBLICATION ... ADD ... EXCEPT (col-list)
>
> ~~~
Fixed

>
> 19.
> +-- Verify excluded columns cannot be part of REPLICA IDENTITY
> +ALTER TABLE pub_test_except1 REPLICA IDENTITY FULL;
> +UPDATE pub_test_except1 SET a = 3 WHERE a = 1;
>
> +CREATE UNIQUE INDEX pub_test_except1_a_idx ON pub_test_except1 (a, c);
> +ALTER TABLE pub_test_except1 REPLICA IDENTITY USING INDEX
> pub_test_except1_a_idx;
> +UPDATE pub_test_except1 SET a = 3 WHERE a = 1;
>
> +DROP INDEX pub_test_except1_a_idx;
> +CREATE UNIQUE INDEX pub_test_except1_a_idx ON pub_test_except1 (a);
> +ALTER TABLE pub_test_except1 REPLICA IDENTITY USING INDEX
> pub_test_except1_a_idx;
> +UPDATE pub_test_except1 SET a = 3 WHERE a = 1;
> +
> +DROP INDEX pub_test_except1_a_idx;
>
> 19a.
> IIUC, really there are multiple tests here, so I think it should all
> be split and commented separately.
>
> a) Verify that EXCEPT col-list cannot contain RI cols (when using RI FULL)
> b) Verify that EXCEPT col-list cannot contain RI cols (when using INDEX)
> c) Verify that so long as no clash between RI cols and the EXCEPT
> col-list, then it is ok
>
> ~
Fixed

>
> 19b.
> IMO, some index names could be better:
>
> CREATE UNIQUE INDEX pub_test_except1_a_idx ON pub_test_except1 (a, c);
> How about 'pub_test_except1_ac_idx'?
>
> ~~~
>
Fixed

> 20.
> +DROP PUBLICATION testpub_except;
> +DROP TABLE pub_test_except1;
> +DROP TABLE pub_sch1.pub_test_except2;
>
> Add a "cleanup" comment.
>
Added

I have addressed the comments and added the latest v16.

Thanks and Regards,
Shlok Kyal

Attachment Content-Type Size
v16-0002-Skip-publishing-the-tables-specified-in-EXCEPT-T.patch application/octet-stream 71.8 KB
v16-0001-Add-RESET-clause-to-Alter-Publication-which-will.patch application/octet-stream 20.2 KB
v16-0003-Skip-publishing-the-columns-specified-in-FOR-TAB.patch application/octet-stream 61.8 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Shlok Kyal 2025-07-19 10:44:51 Re: Skipping schema changes in publication
Previous Message Michael J. Baars 2025-07-19 09:44:46 Re: Upgrade from Fedora 40 to Fedora 42, or from PostgreSQL 16.3 to PostgreSQL 16.9