Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-01-27 07:28:30
Message-ID: CAHut+PsG1G80AoSYka7m1x05vHjKZAzKeVyK4b6CAm2-sTkadg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v71-0001

~~~

1. Commit Message - database

"...that don't satisfy this WHERE clause will be filtered out. This allows a
database or set of tables to be partially replicated. The row filter is
per table. A new row filter can be added simply by specifying a WHERE..."

I don't know what extra information is conveyed by saying "a
database". Isn't it sufficient to just say "This allows a set of
tables to be partially replicated." ?

~~~

2. Commit message - OR'ed

The row filters are applied before publishing the changes. If the
subscription has several publications in which the same table has been
published with different filters, those expressions get OR'ed together so
that rows satisfying any of the expressions will be replicated.

Shouldn't that say:
"with different filters," --> "with different filters (for the same
publish operation),"

~~~

3. Commit message - typo

This means all the other filters become redundant if (a) one of the
publications have no filter at all, (b) one of the publications was
created using FOR ALL TABLES, (c) one of the publications was created
using FOR ALL TABLES IN SCHEMA and the table belongs to that same schema.

Typo:
"have no filter" --> "has no filter"

~~~

4. Commit message - psql \d+

"Psql commands \dRp+ and \d+ will display any row filters."

Actually, just "\d" (without +) will also display row filters. You do
not need to say "\d+"

~~~

5. src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

+ RelationBuildPublicationDesc(rel, &pubdesc);
+ if (!pubdesc.rf_valid_for_update && cmd == CMD_UPDATE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot update table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Column \"%s\" used in the publication WHERE expression is
not part of the replica identity.",
+ get_attname(RelationGetRelid(rel),
+ pubdesc.invalid_rfcol_update,
+ false))));
+ else if (!pubdesc.rf_valid_for_delete && cmd == CMD_DELETE)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+ errmsg("cannot delete from table \"%s\"",
+ RelationGetRelationName(rel)),
+ errdetail("Column \"%s\" used in the publication WHERE expression is
not part of the replica identity.",
+ get_attname(RelationGetRelid(rel),
+ pubdesc.invalid_rfcol_delete,
+ false))));

IMO those conditions should be reversed because (a) it's more optimal
to test the other way around, and (b) for consistency with other code
in this function.

BEFORE
+ if (!pubdesc.rf_valid_for_update && cmd == CMD_UPDATE)
...
+ else if (!pubdesc.rf_valid_for_delete && cmd == CMD_DELETE)
AFTER
+ if (cmd == CMD_UPDATE && !pubdesc.rf_valid_for_update)
...
+ else if (cmd == CMD_DELETE && !pubdesc.rf_valid_for_delete)

~~~

6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter

+ /*
+ * Unchanged toasted replica identity columns are only logged in the
+ * old tuple, copy this over to the new tuple. The changed (or WAL
+ * Logged) toast values are always assembled in memory and set as
+ * VARTAG_INDIRECT. See ReorderBufferToastReplace.
+ */

Something seems not quite right with the comma in that first sentence.
Maybe a period is better?

BEFORE
Unchanged toasted replica identity columns are only logged in the old
tuple, copy this over to the new tuple.
AFTER
Unchanged toasted replica identity columns are only logged in the old
tuple. Copy this over to the new tuple.

~~~

7. src/test/subscription/t/028_row_filter.pl - COPYRIGHT

This TAP file should have a copyright comment that is consistent with
all the other TAP files.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-01-27 07:32:28 Re: Fix uninitialized variable access (src/backend/utils/mmgr/freepage.c)
Previous Message Kyotaro Horiguchi 2022-01-27 07:24:34 Re: TAP test to cover "EndOfLogTLI != replayTLI" case