RE: bogus: logical replication rows/cols combinations

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: bogus: logical replication rows/cols combinations
Date: 2022-04-27 10:08:12
Message-ID: OS0PR01MB5716B82315A067F1D78F247E94FA9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, April 27, 2022 12:56 PM From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Apr 26, 2022 at 4:00 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >
> > On 4/25/22 17:48, Alvaro Herrera wrote:
> >
> > > The desired result on subscriber is:
> > >
> > > table uno;
> > > a │ b │ c
> > > ────┼───┼───
> > > 1 │ 2 │
> > > -1 │ │ 4
> > >
> > >
> > > Thoughts?
> > >
> >
> > I'm not quite sure which of the two behaviors is more "desirable". In a
> > way, it's somewhat similar to publish_as_relid, which is also calculated
> > not considering which of the row filters match?
> >
>
> Right, or in other words, we check all publications to decide it and
> similar is the case for publication actions which are also computed
> independently for all publications.
>
> > But maybe you're right and it should behave the way you propose ... the
> > example I have in mind is a use case replicating table with two types of
> > rows - sensitive and non-sensitive. For sensitive, we replicate only
> > some of the columns, for non-sensitive we replicate everything. Which
> > could be implemented as two publications
> >
> > create publication sensitive_rows
> > for table t (a, b) where (is_sensitive);
> >
> > create publication non_sensitive_rows
> > for table t where (not is_sensitive);
> >
> > But the way it's implemented now, we'll always replicate all columns,
> > because the second publication has no column list.
> >
> > Changing this to behave the way you expect would be quite difficult,
> > because at the moment we build a single OR expression from all the row
> > filters. We'd have to keep the individual expressions, so that we can
> > build a column list for each of them (in order to ignore those that
> > don't match).
> >
> > We'd have to remove various other optimizations - for example we can't
> > just discard row filters if we found "no_filter" publication.
> >
>
> I don't think that is the right way. We need some way to combine
> expressions and I feel the current behavior is sane. I mean to say
> that even if there is one publication that has no filter (column/row),
> we should publish all rows with all columns. Now, as mentioned above
> combining row filters or column lists for all publications appears to
> be consistent with what we already do and seems correct behavior to
> me.
>
> To me, it appears that the method used to decide whether a particular
> table is published or not is also similar to what we do for row
> filters or column lists. Even if there is one publication that
> publishes all tables, we consider the current table to be published
> irrespective of whether other publications have published that table
> or not.
>
> > Or more
> > precisely, we'd have to consider column lists too.
> >
> > In other words, we'd have to merge pgoutput_column_list_init into
> > pgoutput_row_filter_init, and then modify pgoutput_row_filter to
> > evaluate the row filters one by one, and build the column list.
> >
>
> Hmm, I think even if we want to do something here, we also need to
> think about how to achieve similar behavior for initial tablesync
> which will be more tricky.

I think it could be difficult to make the initial tablesync behave the same.
Currently, we make a "COPY" command to do the table sync, I am not sure
how to change the "COPY" query to achieve the expected behavior here.

BTW, For the use case mentioned here:
"""
replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything.
"""

One approach to do this is to create two subscriptions and two
publications which seems a workaround.
-----
create publication uno for table uno (a, b) where (a > 0);
create publication dos for table uno (a, c) where (a < 0);

create subscription sub_uno connection 'port=55432 dbname=alvherre' publication uno;
create subscription sub_dos connection 'port=55432 dbname=alvherre' publication dos;
-----

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Пантюшин Александр Иванович 2022-04-27 10:17:38 Re: Wrong rows count in EXPLAIN
Previous Message Magnus Hagander 2022-04-27 10:01:03 Re: BUG #17448: In Windows 10, version 1703 and later, huge_pages doesn't work.