Re: Column Filtering in Logical Replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-02-15 23:57:31
Message-ID: 6e0943fc-9cd7-45e9-6c1e-f9279fd36e2b@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Peter,

Thanks for the review and sorry for taking so long.

I've addressed most of the comments in the patch I sent a couple minutes
ago. More comments in-line:

On 1/28/22 09:39, Peter Smith wrote:
> Here are some review comments for the v17-0001 patch.
>
> ~~~
>
> 1. Commit message
>
> If no column list is specified, all the columns are replicated, as
> previously
>
> Missing period (.) at the end of that sentence.
>

I plan to reword that anyway.

> ~~~
>
> 2. doc/src/sgml/catalogs.sgml
>
> + <para>
> + This is an array of values that indicates which table columns are
> + part of the publication. For example a value of <literal>1 3</literal>
> + would mean that the first and the third table columns are published.
> + A null value indicates that all attributes are published.
> + </para></entry>
>
> Missing comma:
> "For example" --> "For example,"
>

Fixed.

> Terms:
> The text seems to jump between "columns" and "attributes". Perhaps,
> for consistency, that last sentence should say: "A null value
> indicates that all columns are published."
>

Yeah, but that's a pre-existing problem. I've modified the parts added
by the patch to use "columns" though.

> ~~~
>
> 3. doc/src/sgml/protocol.sgml
>
> </variablelist>
> - Next, the following message part appears for each column
> (except generated columns):
> + Next, the following message part appears for each column (except
> + generated columns and other columns that don't appear in the column
> + filter list, for tables that have one):
> <variablelist>
>
> Perhaps that can be expressed more simply, like:
>
> Next, the following message part appears for each column (except
> generated columns and other columns not present in the optional column
> filter list):
>

Not sure. I'll think about it.

> ~~~
>
> 4. doc/src/sgml/ref/alter_publication.sgml
>
> +ALTER PUBLICATION <replaceable class="parameter">name</replaceable>
> ALTER TABLE <replaceable
> class="parameter">publication_object</replaceable> SET COLUMNS { (
> <replaceable class="parameter">name</replaceable> [, ...] ) | ALL }
>
> The syntax chart looks strange because there is already a "TABLE" and
> a column_name list within the "publication_object" definition, so do
> ALTER TABLE and publication_object co-exist?
> According to the current documentation it suggests nonsense like below is valid:
> ALTER PUBLICATION mypublication ALTER TABLE TABLE t1 (a,b,c) SET
> COLUMNS (a,b,c);
>

Yeah, I think that's wrong. I think "publication_object" is wrong in
this place, so I've used "table_name".

> --
>
> But more fundamentally, I don't see why any new syntax is even needed at all.
>
> Instead of:
> ALTER PUBLICATION mypublication ALTER TABLE users SET COLUMNS
> (user_id, firstname, lastname);
> Why not just:
> ALTER PUBLICATION mypublication ALTER TABLE users (user_id, firstname,
> lastname);
>

I haven't modified the grammar yet, but I agree SET COLUMNS seems a bit
unnecessary. It also seems a bit inconsistent with ADD TABLE which
simply lists the columns right adter the table name.

> Then, if the altered table defines a *different* column list then it
> would be functionally equivalent to whatever your SET COLUMNS is doing
> now. AFAIK this is how the Row-Filter [1] works, so that altering an
> existing table to have a different Row-Filter just overwrites that
> table's filter. IMO the Col-Filter behaviour should work the same as
> that - "SET COLUMNS" is redundant.
>

I'm sorry, I don't understand what this is saying :-(

> ~~~
>
> 5. doc/src/sgml/ref/alter_publication.sgml
>
> - TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [, ... ]
> + TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [ ( <replaceable
> class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ]
>
> That extra comma after the "column_name" seems wrong because there is
> one already in "[, ... ]".
>

Fixed.

> ~~~
>
> 6. doc/src/sgml/ref/create_publication.sgml
>
> - TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [, ... ]
> + TABLE [ ONLY ] <replaceable
> class="parameter">table_name</replaceable> [ * ] [ ( <replaceable
> class="parameter">column_name</replaceable>, [, ... ] ) ] [, ... ]
>
> (Same as comment #5).
> That extra comma after the "column_name" seems wrong because there is
> one already in "[, ... ]".
>

Fixed.

> ~~~
>
> 7. doc/src/sgml/ref/create_publication.sgml
>
> + <para>
> + When a column list is specified, only the listed columns are replicated;
> + any other columns are ignored for the purpose of replication through
> + this publication. If no column list is specified, all columns of the
> + table are replicated through this publication, including any columns
> + added later. If a column list is specified, it must include the replica
> + identity columns.
> + </para>
>
> Suggest to re-word this a bit simpler:
>
> e.g.
> - "listed columns" --> "named columns"
> - I don't think it is necessary to say the unlisted columns are ignored.
> - I didn't think it is necessary to say "though this publication"
>
> AFTER
> When a column list is specified, only the named columns are replicated.
> If no column list is specified, all columns of the table are replicated,
> including any columns added later. If a column list is specified, it must
> include the replica identity columns.
>

Fixed, seems reasonable.

> ~~~
>
> 8. doc/src/sgml/ref/create_publication.sgml
>
> Consider adding another example showing a CREATE PUBLICATION which has
> a column list.
>

Added.

> ~~~
>
> 9. src/backend/catalog/pg_publication.c - check_publication_add_relation
>
> /*
> - * Check if relation can be in given publication and throws appropriate
> - * error if not.
> + * Check if relation can be in given publication and that the column
> + * filter is sensible, and throws appropriate error if not.
> + *
> + * targetcols is the bitmapset of attribute numbers given in the column list,
> + * or NULL if it was not specified.
> */
>
> Typo: "targetcols" --> "columns" ??
>

Right, I noticed that too.

> ~~~
>
> 10. src/backend/catalog/pg_publication.c - check_publication_add_relation
>
> +
> + /* Make sure the column list checks out */
> + if (columns != NULL)
> + {
>
> Perhaps "checks out" could be worded better.
>

Right, I expanded that in my review.

> ~~~
>
> 11. src/backend/catalog/pg_publication.c - check_publication_add_relation
>
> + /* Make sure the column list checks out */
> + if (columns != NULL)
> + {
> + /*
> + * Even if the user listed all columns in the column list, we cannot
> + * allow a column list to be specified when REPLICA IDENTITY is FULL;
> + * that would cause problems if a new column is added later, because
> + * the new column would have to be included (because of being part of
> + * the replica identity) but it's technically not allowed (because of
> + * not being in the publication's column list yet). So reject this
> + * case altogether.
> + */
> + if (replidentfull)
> + ereport(ERROR,
> + errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> + errmsg("invalid column list for publishing relation \"%s\"",
> + RelationGetRelationName(targetrel)),
> + errdetail("Cannot specify a column list on relations with REPLICA
> IDENTITY FULL."));
> +
> + check_publication_columns(pub, targetrel, columns);
> + }
>
> IIUC almost all of the above comment and code is redundant because by
> calling the check_publication_columns function it will do exactly the
> same check...
>
> So, that entire slab might be replaced by 2 lines:
>
> if (columns != NULL)
> check_publication_columns(pub, targetrel, columns);
>

You're right. But I think we can make that even simpler by moving even
the (columns!=NULL) check into the function.

> ~~~
>
> 12. src/backend/catalog/pg_publication.c - publication_set_table_columns
>
> +publication_set_table_columns(Relation pubrel, HeapTuple pubreltup,
> + Relation targetrel, List *columns)
> +{
> + Bitmapset *attset;
> + AttrNumber *attarray;
> + HeapTuple copytup;
> + int natts;
> + bool nulls[Natts_pg_publication_rel];
> + bool replaces[Natts_pg_publication_rel];
> + Datum values[Natts_pg_publication_rel];
> +
> + memset(values, 0, sizeof(values));
> + memset(nulls, 0, sizeof(nulls));
> + memset(replaces, false, sizeof(replaces));
>
> It seemed curious to use memset false for "replaces" but memset 0 for
> "nulls", since they are both bool arrays (??)
>

Fixed.

> ~~~
>
> 13. src/backend/catalog/pg_publication.c - compare_int16
>
> +/* qsort comparator for attnums */
> +static int
> +compare_int16(const void *a, const void *b)
> +{
> + int av = *(const int16 *) a;
> + int bv = *(const int16 *) b;
> +
> + /* this can't overflow if int is wider than int16 */
> + return (av - bv);
> +}
>
> This comparator seems common with another one already in the PG
> source. Perhaps it would be better for generic comparators (like this
> one) to be in some common code instead of scattered cut/paste copies
> of the same thing.
>

I thought about it, but it doesn't really seem worth the effort.

> ~~~
>
> 14. src/backend/commands/publicationcmds.c - AlterPublicationTables
>
> + else if (stmt->action == AP_SetColumns)
> + {
> + Assert(schemaidlist == NIL);
> + Assert(list_length(tables) == 1);
> +
> + PublicationSetColumns(stmt, pubform,
> + linitial_node(PublicationTable, tables));
> + }
>
> (Same as my earlier review comment #4)
>
> Suggest to call this PublicationSetColumns based on some smarter
> detection logic of a changed column list. Please refer to the
> Row-Filter patch [1] for this same function.
>

I don't understand. Comment #4 is about syntax, no?

> ~~~
>
> 15. src/backend/commands/publicationcmds.c - AlterPublicationTables
>
> + /* This is not needed to delete a table */
> + pubrel->columns = NIL;
>
> Perhaps a more explanatory comment would be better there?
>

If I understand the comment, it says we don't actually need to set
columns to NIL. In which case we can just get rid of the change.

> ~~~
>
> 16. src/backend/commands/tablecmds.c - relation_mark_replica_identity
>
> @@ -15841,6 +15871,7 @@ relation_mark_replica_identity(Relation rel,
> char ri_type, Oid indexOid,
> CatalogTupleUpdate(pg_index, &pg_index_tuple->t_self, pg_index_tuple);
> InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
> InvalidOid, is_internal);
> +
> /*
> * Invalidate the relcache for the table, so that after we commit
> * all sessions will refresh the table's replica identity index
>
> Spurious whitespace change seemed unrelated to the Col-Filter patch.
>

Fixed.

> ~~~
>
> 17. src/backend/parser/gram.y
>
> *
> + * ALTER PUBLICATION name SET COLUMNS table_name (column[, ...])
> + * ALTER PUBLICATION name SET COLUMNS table_name ALL
> + *
>
> (Same as my earlier review comment #4)
>
> IMO there was no need for the new syntax of SET COLUMNS.
>

Not modified yet, we'll see about the syntax.

> ~~~
>
> 18. src/backend/replication/logical/proto.c - logicalrep_write_attrs
>
> - /* send number of live attributes */
> - for (i = 0; i < desc->natts; i++)
> - {
> - if (TupleDescAttr(desc, i)->attisdropped || TupleDescAttr(desc,
> i)->attgenerated)
> - continue;
> - nliveatts++;
> - }
> - pq_sendint16(out, nliveatts);
> -
> /* fetch bitmap of REPLICATION IDENTITY attributes */
> replidentfull = (rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL);
> if (!replidentfull)
> idattrs = RelationGetIdentityKeyBitmap(rel);
>
> + /* send number of live attributes */
> + for (i = 0; i < desc->natts; i++)
> + {
> + Form_pg_attribute att = TupleDescAttr(desc, i);
> +
> + if (att->attisdropped || att->attgenerated)
> + continue;
> + if (columns != NULL && !bms_is_member(att->attnum, columns))
> + continue;
> + nliveatts++;
> + }
> + pq_sendint16(out, nliveatts);
> +
>
> This change seemed to have the effect of moving that 4 lines of
> "replidentfull" code from below the loop to above the loop. But moving
> that code seems unrelated to the Col-Filter patch. (??).
>

Right, restored the original code.

> ~~~
>
> 19. src/backend/replication/logical/tablesync.c - fetch_remote_table_info
>
> @@ -793,12 +877,12 @@ fetch_remote_table_info(char *nspname, char *relname,
>
> ExecClearTuple(slot);
> }
> +
> ExecDropSingleTupleTableSlot(slot);
> -
> - lrel->natts = natt;
> -
> walrcv_clear_result(res);
> pfree(cmd.data);
> +
> + lrel->natts = natt;
> }
>
> The shuffling of those few lines seems unrelated to any requirement of
> the Col-Filter patch (??)
>

Yep, undone. I'd bet this is simply due to older versions of the patch
touching this place, and then undoing some of it.

> ~~~
>
> 20. src/backend/replication/logical/tablesync.c - copy_table
>
> + for (int i = 0; i < lrel.natts; i++)
> + {
> + appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> + if (i < lrel.natts - 1)
> + appendStringInfoString(&cmd, ", ");
> + }
>
> Perhaps that could be expressed more simply if the other way around like:
>
> for (int i = 0; i < lrel.natts; i++)
> {
> if (i)
> appendStringInfoString(&cmd, ", ");
> appendStringInfoString(&cmd, quote_identifier(lrel.attnames[i]));
> }
>

I used a slightly different version.

> ~~~
>
> 21. src/backend/replication/pgoutput/pgoutput.c
>
> +
> + /*
> + * Set of columns included in the publication, or NULL if all columns are
> + * included implicitly. Note that the attnums in this list are not
> + * shifted by FirstLowInvalidHeapAttributeNumber.
> + */
> + Bitmapset *columns;
>
> Typo: "in this list" --> "in this set" (??)
>

"bitmap" is what we call Bitmapset so I used that.

> ~~~
>
> 22. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
>
> * Don't publish changes for partitioned tables, because
> - * publishing those of its partitions suffices, unless partition
> - * changes won't be published due to pubviaroot being set.
> + * publishing those of its partitions suffices. (However, ignore
> + * this if partition changes are not to published due to
> + * pubviaroot being set.)
> */
>
> This change seems unrelated to the Col-Filter patch, so perhaps it
> should not be here at all.
>
> Also, typo: "are not to published"
>

Yeah, unrelated. Reverted.

> ~~~
>
> 23. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
>
> + /*
> + * Obtain columns published by this publication, and add them
> + * to the list for this rel. Note that if at least one
> + * publication has a empty column list, that means to publish
> + * everything; so if we saw a publication that includes all
> + * columns, skip this.
> + */
>
> Typo: "a empty" --> "an empty"
>

Fixed.

> ~~~
>
> 24. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
>
> + if (isnull)
> + {
> + /*
> + * If we see a publication with no columns, reset the
> + * list and ignore further ones.
> + */
>
> Perhaps that comment is meant to say "with no column filter" instead
> of "with no columns"?
>

Yep, fixed.

> ~~~
>
> 25. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry
>
> + if (isnull)
> + {
> ...
> + }
> + else if (!isnull)
> + {
> ...
> + }
>
> Is the "if (!isnull)" in the else just to be really REALLY sure it is not null?
>

Double-tap ;-) Removed the condition.

> ~~~
>
> 26. src/bin/pg_dump/pg_dump.c - getPublicationTables
>
> + pubrinfo[i].pubrattrs = attribs->data;
> + }
> + else
> + pubrinfo[j].pubrattrs = NULL;
>
> I got confused reading this code. Are those different indices 'i' and
> 'j' correct?
>

Good catch! I think you're right and it should be "j" in both places.
This'd only cause trouble in selective pg_dumps (when dumping selected
tables). The patch clearly needs some pg_dump tests.

> ~~~
>
> 27. src/bin/psql/describe.c
>
> The Row-Filter [1] displays filter information not only for the psql
> \dRp+ command but also for the psql \d <tablename> command. Perhaps
> the Col-Filter patch should do that too.
>

Not sure.

> ~~~
>
> 28. src/bin/psql/tab-complete.c
>
> @@ -1657,6 +1657,8 @@ psql_completion(const char *text, int start, int end)
> /* ALTER PUBLICATION <name> ADD */
> else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD"))
> COMPLETE_WITH("ALL TABLES IN SCHEMA", "TABLE");
> + else if (Matches("ALTER", "PUBLICATION", MatchAny, "ADD", "TABLE"))
> + COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_tables, NULL);
> /* ALTER PUBLICATION <name> DROP */
>
> I am not sure about this one- is that change even related to the
> Col-Filter patch or is this some unrelated bugfix?
>

Yeah, seems unrelated - possibly from a rebase or something. Removed.

> ~~~
>
> 29. src/include/catalog/pg_publication.h
>
> @@ -86,6 +86,7 @@ typedef struct Publication
> typedef struct PublicationRelInfo
> {
> Relation relation;
> + List *columns;
> } PublicationRelInfo;
>
> Perhaps that needs some comment. e.g. do you need to mention that a
> NIL List means all columns?
>

I added a short comment.

> ~~~
>
> 30. src/include/nodes/parsenodes.h
>
> @@ -3642,6 +3642,7 @@ typedef struct PublicationTable
> {
> NodeTag type;
> RangeVar *relation; /* relation to be published */
> + List *columns; /* List of columns in a publication table */
> } PublicationTable;
>
>
> That comment "List of columns in a publication table" doesn't really
> say anything helpful.
>
> Perhaps it should mention that a NIL List means all table columns?
>

Not sure, seems fine.

> ~~~
>
> 31. src/test/regress/sql/publication.sql
>
> The regression test file has an uncommon mixture of /* */ and -- style comments.
>
> Perhaps change all the /* */ ones?
>

Yeah, that needs some cleanup. I haven't done anything about it yet.

> ~~~
>
> 32. src/test/regress/sql/publication.sql
>
> +CREATE TABLE testpub_tbl5 (a int PRIMARY KEY, b text, c text,
> + d int generated always as (a + length(b)) stored);
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, x); -- error
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (b, c); -- error
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, d); -- error
> +ALTER PUBLICATION testpub_fortable ADD TABLE testpub_tbl5 (a, c); -- ok
>
> For all these tests (and more) there seems not sufficient explanation
> comments to say exactly what each test case is testing, e.g. *why* is
> an "error" expected for some cases but "ok" for others.
>

Not sure. I think the error is generally obvious in the expected output.

> ~~~
>
> 33. src/test/regress/sql/publication.sql
>
> "-- no dice"
>
> (??) confusing comment.
>

Same as for the errors.

> ~~~
>
> 34. src/test/subscription/t/028_column_list.pl
>
> I think a few more comments in this TAP file would help to make the
> purpose of the tests more clear.
>

Yeah, the 0004 patch I shared a couple minutes ago does exactly that.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-02-15 23:58:45 USE_BARRIER_SMGRRELEASE on Linux?
Previous Message Andres Freund 2022-02-15 23:51:57 Re: Race conditions in 019_replslot_limit.pl