Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-08 06:03:37
Message-ID: CAHut+PuBdXGLw1+CBoNxXUp3bHcHcKYWHx1RSGF6tY5aSLu5ZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 2, 2021 at 2:59 PM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
...
> Attach the v44-0005 top-up patch.
> This version addressed all the comments received so far,
> mainly including the following changes:
> 1) rename rfcol_valid_for_replica to rfcol_valid
> 2) Remove the struct PublicationInfo and add the rfcol_valid flag directly in relation
> 3) report the invalid column number in the error message.
> 4) Rename some function to match the usage.
> 5) Fix some typos and add some code comments.
> 6) Fix a miss in testcase.

Below are my review comments for the most recent v44-0005 (top-up) patch:

======

1. src/backend/executor/execReplication.c
+ invalid_rfcol = RelationGetInvalRowFilterCol(rel);
+
+ /*
+ * It is only safe to execute UPDATE/DELETE when all columns of the row
+ * filters from publications which the relation is in are part of the
+ * REPLICA IDENTITY.
+ */
+ if (invalid_rfcol != InvalidAttrNumber)
+ {

It seemed confusing that when the invalid_rfcol is NOT invalid at all
then it is InvalidAttrNumber, so perhaps this code would be easier to
read if instead the condition was written just as:
---
if (invalid_rfcol)
{
...
}
---

====

2. invalid_rfcol var name
This variable name is used in a few places but I thought it was too
closely named with the "rfcol_valid" variable even though it has a
completely different meaning. IMO "invalid_rfcol" might be better
named "invalid_rfcolnum" or something like that to reinforce that it
is an AttributeNumber.

====

3. src/backend/utils/cache/relcache.c - function comment
+ * If not all the row filter columns are part of REPLICA IDENTITY, return the
+ * invalid column number, InvalidAttrNumber otherwise.
+ */

Minor rewording:
"InvalidAttrNumber otherwise." --> "otherwise InvalidAttrNumber."

====

4. src/backend/utils/cache/relcache.c - function name
+AttrNumber
+RelationGetInvalRowFilterCol(Relation relation)

IMO nothing was gained by saving 2 chars of the name.
"RelationGetInvalRowFilterCol" --> "RelationGetInvalidRowFilterCol"

====

5. src/backend/utils/cache/relcache.c
+/* For invalid_rowfilter_column_walker. */
+typedef struct {
+ AttrNumber invalid_rfcol;
+ Bitmapset *bms_replident;
+} rf_context;
+

The members should be commented.

====

6. src/include/utils/rel.h
/*
+ * true if the columns of row filters from all the publications the
+ * relation is in are part of replica identity.
+ */
+ bool rd_rfcol_valid;

I felt the member comment is not quite telling the full story. e.g.
IIUC this member is also true when pubaction is something other than
update/delete - but that case doesn't even do replica identity
checking at all. There might not even be any replica identity.

====

6. src/test/regress/sql/publication.sql
CREATE PUBLICATION testpub6 FOR TABLE rf_tbl_abcd_pk WHERE (a > 99);
+-- fail - "a" is in PK but it is not part of REPLICA IDENTITY INDEX
+update rf_tbl_abcd_pk set a = 1;
+DROP PUBLICATION testpub6;
-- ok - "c" is not in PK but it is part of REPLICA IDENTITY INDEX
-SET client_min_messages = 'ERROR';
CREATE PUBLICATION testpub6 FOR TABLE rf_tbl_abcd_pk WHERE (c > 99);
DROP PUBLICATION testpub6;
-RESET client_min_messages;
--- fail - "a" is not in REPLICA IDENTITY INDEX
CREATE PUBLICATION testpub6 FOR TABLE rf_tbl_abcd_nopk WHERE (a > 99);
+-- fail - "a" is not in REPLICA IDENTITY INDEX
+update rf_tbl_abcd_nopk set a = 1;

The "update" DML should be uppercase "UPDATE" for consistency with the
surrounding tests.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-12-08 06:08:37 Re: Data is copied twice when specifying both child and parent table in publication
Previous Message vignesh C 2021-12-08 06:00:27 Re: Data is copied twice when specifying both child and parent table in publication