RE: Column Filtering in Logical Replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: RE: Column Filtering in Logical Replication
Date: 2022-03-09 10:12:01
Message-ID: OS0PR01MB571614E4A39DD33669D5AFD6940A9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, March 9, 2022 6:04 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >
> > On 3/4/22 11:42, Amit Kapila wrote:
> >
> > > *
> > > Fetching column filter info in tablesync.c is quite expensive. It
> > > seems to be using four round-trips to get the complete info whereas
> > > for row-filter we use just one round trip. I think we should try to
> > > get both row filter and column filter info in just one round trip.
> > >
> >
> > Maybe, but I really don't think this is an issue.
> >
>
> I am not sure but it might matter for small tables. Leaving aside the
> performance issue, I think the current way will get the wrong column list in
> many cases: (a) The ALL TABLES IN SCHEMA case handling won't work for
> partitioned tables when the partitioned table is part of one schema and
> partition table is part of another schema. (b) The handling of partition tables in
> other cases will fetch incorrect lists as it tries to fetch the column list of all the
> partitions in the hierarchy.
>
> One of my colleagues has even tested these cases both for column filters and
> row filters and we find the behavior of row filter is okay whereas for column
> filter it uses the wrong column list. We will share the tests and results with you
> in a later email. We are trying to unify the column filter queries with row filter to
> make their behavior the same and will share the findings once it is done. I hope
> if we are able to achieve this that we will reduce the chances of bugs in this area.
>
> Note: I think the first two patches for tests are not required after commit
> ceb57afd3c.

Hi,

Here are some tests and results about the table sync query of
column filter patch and row filter.

1) multiple publications which publish schema of parent table and partition.
----pub
create schema s1;
create table s1.t (a int, b int, c int) partition by range (a);
create table t_1 partition of s1.t for values from (1) to (10);
create publication pub1 for all tables in schema s1;
create publication pub2 for table t_1(b);

----sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub1, pub2;

When doing table sync for 't_1', the column list will be (b). I think it should
be no filter because table t_1 is also published via ALL TABLES IN SCHMEA
publication.

For Row Filter, it will use no filter for this case.

2) one publication publishes both parent and child
----pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10)
partition by range (a);
create table t_2 partition of t_1 for values from (1) to (10);

create publication pub2 for table t_1(a), t_2
with (PUBLISH_VIA_PARTITION_ROOT);

----sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub2;

When doing table sync for table 't_1', it has no column list. I think the
expected column list is (a).

For Row Filter, it will use the row filter of the top most parent table(t_1) in
this case.

3) one publication publishes both parent and child
----pub
create table t (a int, b int, c int) partition by range (a);
create table t_1 partition of t for values from (1) to (10)
partition by range (a);
create table t_2 partition of t_1 for values from (1) to (10);

create publication pub2 for table t_1(a), t_2(b)
with (PUBLISH_VIA_PARTITION_ROOT);

----sub
- prepare tables
CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' PUBLICATION pub2;

When doing table sync for table 't_1', the column list would be (a, b). I think
the expected column list is (a).

For Row Filter, it will use the row filter of the top most parent table(t_1) in
this case.

Best regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Borisov 2022-03-09 10:29:09 [PATCH] Double declaration in pg_upgrade.h
Previous Message Fabien COELHO 2022-03-09 10:08:57 Re: cpluspluscheck complains about use of register