Re: Column Filtering in Logical Replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shiy(dot)fnst(at)fujitsu(dot)com
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-09 10:03:40
Message-ID: CAA4eK1JJ5qgqbjfYQU9F4R1+OABxec-DQechQXus0fanDjaAVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 7, 2022 at 8:48 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 3/4/22 11:42, Amit Kapila wrote:
> > On Wed, Mar 2, 2022 at 5:43 PM Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >>
> >> Attached is an updated patch, addressing most of the issues reported so
> >> far. There are various minor tweaks, but the main changes are:
> > ...
> >>
> >> 3) checks of column filter vs. publish_via_partition_root and replica
> >> identity, following the same logic as the row-filter patch (hopefully,
> >> it touches the same places, using the same logic, ...)
> >>
> >> That means - with "publish_via_partition_root=false" it's not allowed to
> >> specify column filters on partitioned tables, only for leaf partitions.
> >>
> >> And we check column filter vs. replica identity when adding tables to
> >> publications, or whenever we change the replica identity.
> >>
> >
> > This handling is different from row filter work and I see problems
> > with it.
>
> By different, I assume you mean I tried to enfoce the rules in ALTER
> PUBLICATION and other ALTER commands, instead of when modifying the
> data?
>

Yes.

> OK, I reworked this to do the same thing as the row filtering patch.
>

Thanks, I'll check this.

> > The column list validation w.r.t primary key (default replica
> > identity) is missing. The handling of column list vs. partitions has
> > multiple problems: (a) In attach partition, the patch is just checking
> > ancestors for RI validation but what if the table being attached has
> > further subpartitions; (b) I think the current locking also seems to
> > have problems because it is quite possible that while it validates the
> > ancestors here, concurrently someone changes the column list. I think
> > it won't be enough to just change the locking mode because with the
> > current patch strategy during attach, we will be first taking locks
> > for child tables of current partition and then parent tables which can
> > pose deadlock hazards.
> > > The columns list validation also needs to be done when we change
> > publication action.
> >
> I believe those issues should be solved by adopting the same approach as
> the row-filtering patch, right?
>

Right.

>
> > Some other miscellaneous comments:
> > =============================
> > *
> > In get_rel_sync_entry(), the handling for partitioned tables doesn't
> > seem to be correct. It can publish a different set of columns based on
> > the order of publications specified in the subscription.
> >
> > For example:
> > ----
> > create table parent (a int, b int, c int) partition by range (a);
> > create table test_part1 (like parent);
> > alter table parent attach partition test_part1 for values from (1) to (10);
> >
> > create publication pub for table parent(a) with (PUBLISH_VIA_PARTITION_ROOT);
> > create publication pub2 for table test_part1(b);
> > ---
> >
> > Now, depending on the order of publications in the list while defining
> > subscription, the column list will change
> > ----
> > create subscription sub connection 'port=10000 dbname=postgres'
> > publication pub, pub2;
> >
> > For the above, column list will be: (a)
> >
> > create subscription sub connection 'port=10000 dbname=postgres'
> > publication pub2, pub;
> >
> > For this one, the column list will be: (a, b)
> > ----
> >
> > To avoid this, the column list should be computed based on the final
> > publish_as_relid as we are doing for the row filter.
> >
>
> Hmm, yeah. That seems like a genuine problem - it should not depend on
> the order of publications in the subscription, I guess.
>
> But is it an issue in the patch? Isn't that a pre-existing issue? AFAICS
> the problem is that we initialize publish_as_relid=relid before the loop
> over publications, and then just update it. So the first iteration
> starts with relid, but the second iteration ends with whatever value is
> set by the first iteration (e.g. the root).
>
> So with the example you posted, we start with
>
> publish_as_relid = relid = test_part1
>
> but then if the first publication is pubviaroot=true, we update it to
> parent. And in the second iteration, we fail to find the column filter,
> because "parent" (publish_as_relid) is not part of the pub2.
>
> If we do it in the other order, we leave the publish_as_relid value as
> is (and find the filter), and then update it in the second iteration
> (and find the column filter too).
>
> Now, this can be resolved by re-calculating the publish_as_relid from
> scratch in each iteration (start with relid, then maybe update it). But
> that's just half the story - the issue is there even without column
> filters. Consider this example:
>
> create table t (a int, b int, c int) partition by range (a);
>
> create table t_1 partition of t for values from (1) to (10)
> partition by range (a);
>
> create table t_2 partition of t_1 for values from (1) to (10);
>
> create publication pub1 for table t(a)
> with (PUBLISH_VIA_PARTITION_ROOT);
>
> create publication pub2 for table t_1(a)
> with (PUBLISH_VIA_PARTITION_ROOT);
>
>
> Now, is you change subscribe to "pub1, pub2" and "pub2, pub1", we'll end
> up with different publish_as_relid values (t or t_1). Which seems like
> the same ambiguity issue.
>

I think we should fix this existing problem by always using the
top-most table as publish_as_relid. Basically, we can check, if the
existing publish_as_relid is an ancestor of a new rel that is going to
replace it then we shouldn't replace it. However, I think even if we
fix the existing problem, we will still have the order problem for the
column filter patch, and to avoid that instead of fetching column
filters in the publication loop, we should use the final
publish_as_relid. I think it will have another problem as well if we
don't use final publish_as_relid which is that sometimes when we
should not use any filter (say when pubviaroot is true and that
publication has root partitioned table which has no filter) as per our
rule of filters for a partitioned table, it can still use some filter
from the non-root table.

>
> > *
> > Fetching column filter info in tablesync.c is quite expensive. It
> > seems to be using four round-trips to get the complete info whereas
> > for row-filter we use just one round trip. I think we should try to
> > get both row filter and column filter info in just one round trip.
> >
>
> Maybe, but I really don't think this is an issue.
>

I am not sure but it might matter for small tables. Leaving aside the
performance issue, I think the current way will get the wrong column
list in many cases: (a) The ALL TABLES IN SCHEMA case handling won't
work for partitioned tables when the partitioned table is part of one
schema and partition table is part of another schema. (b) The handling
of partition tables in other cases will fetch incorrect lists as it
tries to fetch the column list of all the partitions in the hierarchy.

One of my colleagues has even tested these cases both for column
filters and row filters and we find the behavior of row filter is okay
whereas for column filter it uses the wrong column list. We will share
the tests and results with you in a later email. We are trying to
unify the column filter queries with row filter to make their behavior
the same and will share the findings once it is done. I hope if we are
able to achieve this that we will reduce the chances of bugs in this
area.

Note: I think the first two patches for tests are not required after
commit ceb57afd3c.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2022-03-09 10:04:49 Re: RFC: Logging plan of the running query
Previous Message Peter Eisentraut 2022-03-09 09:29:17 Re: Time to drop plpython2?