Re: bogus: logical replication rows/cols combinations

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: bogus: logical replication rows/cols combinations
Date: 2022-04-27 10:33:11
Message-ID: CAA4eK1+9LCX3qKyo9S4uaaciM0WqZrP6m=kMzufgpuv=a=PgXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 27, 2022 at 3:13 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Apr-26, Tomas Vondra wrote:
>
> > I'm not quite sure which of the two behaviors is more "desirable". In a
> > way, it's somewhat similar to publish_as_relid, which is also calculated
> > not considering which of the row filters match?
>
> I grepped doc/src/sgml for `publish_as_relid` and found no hits, so
> I suppose it's not a user-visible feature as such.
>

`publish_as_relid` is computed based on 'publish_via_partition_root'
setting of publication which is a user-visible feature.

> > But maybe you're right and it should behave the way you propose ... the
> > example I have in mind is a use case replicating table with two types of
> > rows - sensitive and non-sensitive. For sensitive, we replicate only
> > some of the columns, for non-sensitive we replicate everything.
>
> Exactly. If we blindly publish row/column values that aren't in *any*
> publications, this may lead to leaking protected values.
>
> > Changing this to behave the way you expect would be quite difficult,
> > because at the moment we build a single OR expression from all the row
> > filters. We'd have to keep the individual expressions, so that we can
> > build a column list for each of them (in order to ignore those that
> > don't match).
>
> I think we should do that, yeah.
>

This can hit the performance as we need to evaluate each expression
for each row.

> > I can take a stab at it, but it seems strange to not apply the same
> > logic to evaluation of publish_as_relid. I wonder what Amit thinks about
> > this, as he wrote the row filter stuff.
>
> By grepping publicationcmds.c, it seems that publish_as_relid refers to
> the ancestor partitioned table that is used for column list and
> rowfilter determination, when a partition is being published as part of
> it.
>

Yeah, this is true when the corresponding publication has set
'publish_via_partition_root' as true.

> I don't think these things are exactly parallel.
>

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

I feel it would be much easier to create two different subscriptions
as mentioned by Hou-San [1] for the case you are talking about if the
user really needs something like that.

> ... In fact I think they are quite orthogonal: probably you should be
> able to publish a partitioned table in two publications, with different
> rowfilters and different column lists (which can come from the
> topmost partitioned table), and each partition should still work in the
> way I describe above.
>

We consider the column lists or row filters for either the partition
(on which the current operation is performed) or partitioned table
based on 'publish_via_partition_root' parameter of publication.

[1] - https://www.postgresql.org/message-id/OS0PR01MB5716B82315A067F1D78F247E94FA9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2022-04-27 10:39:00 Re: Wrong rows count in EXPLAIN
Previous Message Bharath Rupireddy 2022-04-27 10:22:06 Re: pgsql: Add contrib/pg_walinspect.