Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-02-10 10:18:22
Message-ID: CAA4eK1Kj1Frxo_L1edy40VzwWWRTyRt_QbnCLy21WcYL-DtdnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 10, 2022 at 9:29 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
>
> Attach the V80 patch which addressed the above comments and
> comments from Amit[1].
>

Thanks for the new version. Few minor/cosmetic comments:

1. Can we slightly change the below comment:
Before:
+ * To avoid fetching the publication information, we cache the publication
+ * actions and row filter validation information.

After:
To avoid fetching the publication information repeatedly, we cache the
publication actions and row filter validation information.

2.
+ /*
+ * For ordinary tables, make sure we don't copy data from child
+ * that inherits the named table.
+ */
+ if (lrel.relkind == RELKIND_RELATION)
+ appendStringInfoString(&cmd, " ONLY ");

I think we should mention the reason why we are doing so. So how about
something like: "For regular tables, make sure we don't copy data from
a child that inherits the named table as those will be copied
separately."

3.
Can we change the below comment?

Before:
+ /*
+ * Initialize the tuple slot, map and row filter that are only used
+ * when publishing inserts, updates or deletes.
+ */

After:
Initialize the tuple slot, map, and row filter. These are only used
when publishing inserts, updates, or deletes.

4.
+CREATE TABLE testpub_rf_tbl1 (a integer, b text);
+CREATE TABLE testpub_rf_tbl2 (c text, d integer);

Here, you can add a comment saying: "-- Test row filters" or something
on those lines.

5.
+-- test \d+ (now it displays filter information)
+SET client_min_messages = 'ERROR';
+CREATE PUBLICATION testpub_dplus_rf_yes FOR TABLE testpub_rf_tbl1
WHERE (a > 1) WITH (publish = 'insert');
+CREATE PUBLICATION testpub_dplus_rf_no FOR TABLE testpub_rf_tbl1;
+RESET client_min_messages;
+\d+ testpub_rf_tbl1
+ Table "public.testpub_rf_tbl1"
+ Column | Type | Collation | Nullable | Default | Storage | Stats
target | Description
+--------+---------+-----------+----------+---------+----------+--------------+-------------
+ a | integer | | | | plain | |
+ b | text | | | | extended | |
+Publications:
+ "testpub_dplus_rf_no"
+ "testpub_dplus_rf_yes" WHERE (a > 1)

I think here \d is sufficient to show row filters? I think it is
better to use table names such as testpub_rf_yes or testpub_rf_no in
this test.

6.
+# Similarly, the table filter for tab_rf_x (after the initial phase) has no
+# effect when combined with the ALL TABLES IN SCHEMA.
+# Expected: 5 initial rows + 2 new rows = 7 rows
+$node_publisher->safe_psql('postgres', "INSERT INTO tab_rf_x (x)
VALUES (-99), (99)");
+$node_publisher->wait_for_catchup($appname);
+$result = $node_subscriber->safe_psql('postgres', "SELECT count(x)
FROM tab_rf_x");
+is($result, qq(7), 'check table tab_rf_x should not be filtered');

I think the comment here should say "ALL TABLES." instead of "ALL
TABLES IN SCHEMA." as there is no publication before this test which
is created with "ALL TABLES IN SCHEMA" clause.

7.
+# The subscription of the ALL TABLES IN SCHEMA publication means
there should be
+# no filtering on the tablesync COPY, so all expect all 5 will be present.

It doesn't make sense to use 'all' twice in the above comment, the
first one can be removed.

8.
+
+# setup structure on publisher
+$node_publisher->safe_psql('postgres',

I think it will be good if we can add some generic comments explaining
the purpose of the tests following this. We can add "# Tests FOR TABLE
with row filter publications" before the current comment.

9. For the newly added test for tab_rowfilter_inherited, the patch has
a test case only for initial sync, can we add a test for replication
after initial sync for the same?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-02-10 11:08:02 Re: Database-level collation version tracking
Previous Message Ashutosh Sharma 2022-02-10 09:45:16 Re: Identify missing publications from publisher while create/alter subscription.