Re: row filtering for logical replication

From: japin <japinli(at)hotmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: onderkalaci(at)gmail(dot)com, david(at)pgmasters(dot)net, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Amit Langote <amitlangote09(at)gmail(dot)com>, movead li <movead(dot)li(at)highgo(dot)ca>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: row filtering for logical replication
Date: 2021-02-01 09:11:33
Message-ID: ME3P282MB16678160F5EA09DCB962474EB6B69@ME3P282MB1667.AUSP282.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Mon, 01 Feb 2021 at 08:23, Euler Taveira <euler(at)eulerto(dot)com> wrote:
> On Mon, Mar 16, 2020, at 10:58 AM, David Steele wrote:
>> Please submit to a future CF when a new patch is available.
> Hi,
>
> This is another version of the row filter patch. Patch summary:
>
> 0001: refactor to remove dead code
> 0002: grammar refactor for row filter
> 0003: core code, documentation, and tests
> 0004: psql code
> 0005: pg_dump support
> 0006: debug messages (only for test purposes)
> 0007: measure row filter overhead (only for test purposes)
>

Thanks for updating the patch. Here are some comments:

(1)
+ <para>
+ If this parameter is <literal>false</literal>, it uses the
+ <literal>WHERE</literal> clause from the partition; otherwise,the
+ <literal>WHERE</literal> clause from the partitioned table is used.
</para>

otherwise,the -> otherwise, the

(2)
+ <para>
+ Columns used in the <literal>WHERE</literal> clause must be part of the
+ primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
+ <command>UPDATE</command> and <command>DELETE</command> operations will not
+ be replicated.
+ </para>
+

IMO we should indent one space here.

(3)
+
+ <para>
+ The <literal>WHERE</literal> clause expression is executed with the role used
+ for the replication connection.
+ </para>

Same as (2).

The documentation says:

> Columns used in the <literal>WHERE</literal> clause must be part of the
> primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
> <command>UPDATE</command> and <command>DELETE</command> operations will not
> be replicated.

Why we need this limitation? Am I missing something?

When I tested, I find that the UPDATE can be replicated, while the DELETE
cannot be replicated. Here is my test-case:

-- 1. Create tables and publications on publisher
CREATE TABLE t1 (a int primary key, b int);
CREATE TABLE t2 (a int primary key, b int);
INSERT INTO t1 VALUES (1, 11);
INSERT INTO t2 VALUES (1, 11);
CREATE PUBLICATION mypub1 FOR TABLE t1;
CREATE PUBLICATION mypub2 FOR TABLE t2 WHERE (b > 10);

-- 2. Create tables and subscriptions on subscriber
CREATE TABLE t1 (a int primary key, b int);
CREATE TABLE t2 (a int primary key, b int);
CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub1;
CREATE SUBSCRIPTION mysub2 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub2;

-- 3. Check publications on publisher
postgres=# \dRp+
Publication mypub1
Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
-------+------------+---------+---------+---------+-----------+----------
japin | f | t | t | t | t | f
Tables:
"public.t1"

Publication mypub2
Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
-------+------------+---------+---------+---------+-----------+----------
japin | f | t | t | t | t | f
Tables:
"public.t2" WHERE (b > 10)

-- 4. Check initialization data on subscriber
postgres=# table t1;
a | b
---+----
1 | 11
(1 row)

postgres=# table t2;
a | b
---+----
1 | 11
(1 row)

-- 5. The update on publisher
postgres=# update t1 set b = 111 where b = 11;
UPDATE 1
postgres=# table t1;
a | b
---+-----
1 | 111
(1 row)

postgres=# update t2 set b = 111 where b = 11;
UPDATE 1
postgres=# table t2;
a | b
---+-----
1 | 111
(1 row)

-- 6. check the updated records on subscriber
postgres=# table t1;
a | b
---+-----
1 | 111
(1 row)

postgres=# table t2;
a | b
---+-----
1 | 111
(1 row)

-- 7. Delete records on publisher
postgres=# delete from t1 where b = 111;
DELETE 1
postgres=# table t1;
a | b
---+---
(0 rows)

postgres=# delete from t2 where b = 111;
DELETE 1
postgres=# table t2;
a | b
---+---
(0 rows)

-- 8. Check the deleted records on subscriber
postgres=# table t1;
a | b
---+---
(0 rows)

postgres=# table t2;
a | b
---+-----
1 | 111
(1 row)

I do a simple debug, and find that the pgoutput_row_filter() return false when I
execute "delete from t2 where b = 111;".

Does the publication only load the REPLICA IDENTITY columns into oldtuple when we
execute DELETE? So the pgoutput_row_filter() cannot find non REPLICA IDENTITY
columns, which cause it return false, right? If that's right, the UPDATE might
not be limitation by REPLICA IDENTITY, because all columns are in newtuple,
isn't it?

--
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hou, Zhijie 2021-02-01 09:18:39 RE: Parallel INSERT (INTO ... SELECT ...)
Previous Message wenjing 2021-02-01 08:49:10 Re: [Proposal] Global temporary tables