Re: Column Filtering in Logical Replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-10 18:17:15
Message-ID: e4b0fb99-3d74-7333-1e71-74f990b641ee@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/9/22 11:12, houzj(dot)fnst(at)fujitsu(dot)com wrote:
> Hi,
>
> Here are some tests and results about the table sync query of
> column filter patch and row filter.
>
> 1) multiple publications which publish schema of parent table and partition.
> ----pub
> create schema s1;
> create table s1.t (a int, b int, c int) partition by range (a);
> create table t_1 partition of s1.t for values from (1) to (10);
> create publication pub1 for all tables in schema s1;
> create publication pub2 for table t_1(b);
>
> ----sub
> - prepare tables
> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub1, pub2;
>
> When doing table sync for 't_1', the column list will be (b). I think it should
> be no filter because table t_1 is also published via ALL TABLES IN SCHMEA
> publication.
>
> For Row Filter, it will use no filter for this case.
>
>
> 2) one publication publishes both parent and child
> ----pub
> create table t (a int, b int, c int) partition by range (a);
> create table t_1 partition of t for values from (1) to (10)
> partition by range (a);
> create table t_2 partition of t_1 for values from (1) to (10);
>
> create publication pub2 for table t_1(a), t_2
> with (PUBLISH_VIA_PARTITION_ROOT);
>
> ----sub
> - prepare tables
> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub2;
>
> When doing table sync for table 't_1', it has no column list. I think the
> expected column list is (a).
>
> For Row Filter, it will use the row filter of the top most parent table(t_1) in
> this case.
>
>
> 3) one publication publishes both parent and child
> ----pub
> create table t (a int, b int, c int) partition by range (a);
> create table t_1 partition of t for values from (1) to (10)
> partition by range (a);
> create table t_2 partition of t_1 for values from (1) to (10);
>
> create publication pub2 for table t_1(a), t_2(b)
> with (PUBLISH_VIA_PARTITION_ROOT);
>
> ----sub
> - prepare tables
> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub2;
>
> When doing table sync for table 't_1', the column list would be (a, b). I think
> the expected column list is (a).
>
> For Row Filter, it will use the row filter of the top most parent table(t_1) in
> this case.
>

Attached is an updated patch version, addressing all of those issues.

0001 is a bugfix, reworking how we calculate publish_as_relid. The old
approach was unstable with multiple publications, giving different
results depending on order of the publications. This should be
backpatched into PG13 where publish_via_partition_root was introduced, I
think.

0002 is the main patch, merging the changes proposed by Peter and fixing
the issues reported here. In most cases this means adopting the code
used for row filters, and perhaps simplifying it a bit.

But I also tried to implement a row-filter test for 0001, and I'm not
sure I understand the behavior I observe. Consider this:

-- a chain of 3 partitions (on both publisher and subscriber)
CREATE TABLE test_part_rf (a int primary key, b int, c int)
PARTITION BY LIST (a);

CREATE TABLE test_part_rf_1
PARTITION OF test_part_rf FOR VALUES IN (1,2,3,4,5)
PARTITION BY LIST (a);

CREATE TABLE test_part_rf_2
PARTITION OF test_part_rf_1 FOR VALUES IN (1,2,3,4,5);

-- initial data
INSERT INTO test_part_rf VALUES (1, 5, 100);
INSERT INTO test_part_rf VALUES (2, 15, 200);

-- two publications, each adding a different partition
CREATE PUBLICATION test_pub_part_1 FOR TABLE test_part_rf_1
WHERE (b < 10) WITH (publish_via_partition_root);

CREATE PUBLICATION test_pub_part_2 FOR TABLE test_part_rf_2
WHERE (b > 10) WITH (publish_via_partition_root);

-- now create the subscription (also try opposite ordering)
CREATE SUBSCRIPTION test_part_sub CONNECTION '...'
PUBLICATION test_pub_part_1, test_pub_part_2;

-- wait for sync

-- inert some more data
INSERT INTO test_part_rf VALUES (3, 6, 300);
INSERT INTO test_part_rf VALUES (4, 16, 400);

-- wait for catchup

Now, based on the discussion here, my expectation is that we'll use the
row filter from the top-most ancestor in any publication, which in this
case is test_part_rf_1. Hence the filter should be (b < 10).

So I'd expect these rows to be replicated:

1,5,100
3,6,300

But that's not what I get, unfortunately. I get different results,
depending on the order of publications:

1) test_pub_part_1, test_pub_part_2

1|5|100
2|15|200
3|6|300
4|16|400

2) test_pub_part_2, test_pub_part_1

3|6|300
4|16|400

That seems pretty bizarre, because it either means we're not enforcing
any filter or some strange combination of filters (notice that for (2)
we skip/replicate rows matching either filter).

I have to be missing something important, but this seems confusing.
There's a patch adding a simple test case to 028_row_filter.sql (named
.txt, so as not to confuse cfbot).

FWIW I'm not convinced applying just the filters (both row and column)
is the right approach. It might be OK for a single publication, but with
multiple publications not so much. If you list multiple publications for
a subscription, it seems natural to expect a union of all the data, a
bit as if there were multiple subscriptions. But what you actually get
is some subset, depending on what other relations the other publications
include.

Of course, this only happens if the publications include different
ancestors. If all include the same ancestor, everything works fine and
you get the "union" of data.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
row-filter-test.txt text/plain 3.1 KB
0001-fixup-publish_as_relid-20220310.patch text/x-patch 2.6 KB
0002-Allow-specifying-column-filters-for-logical-20220310.patch text/x-patch 163.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2022-03-10 18:20:10 Re: Column Filtering in Logical Replication
Previous Message Robert Haas 2022-03-10 18:11:43 Re: role self-revocation