RE: Column Filtering in Logical Replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Rahila Syed <rahilasyed90(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Column Filtering in Logical Replication
Date: 2021-12-17 09:46:14
Message-ID: OS0PR01MB5716330FFE3803DF887D073C94789@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, December 17, 2021 1:55 AM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> On 2021-Dec-16, houzj(dot)fnst(at)fujitsu(dot)com wrote:
>
> > The patch ensures all columns of RT are in column list when
> > CREATE/ALTER publication, but it seems doesn't prevent user from
> > changing the replica identity or dropping the index used in replica
> > identity. Do we also need to check those cases ?
>
> Yes, we do. As it happens, I spent a couple of hours yesterday writing code for
> that, at least partially. I haven't yet checked what happens with cases like
> REPLICA NOTHING, or REPLICA INDEX <xyz> and then dropping that index.
>
> My initial ideas were a bit wrong BTW: I thought we should check the
> combination of column lists in all publications (a bitwise-OR of column bitmaps,
> so to speak). But conceptually that's wrong: we need to check the column list
> of each publication individually instead. Otherwise, if you wanted to hide a
> column from some publication but that column was part of the replica identity,
> there'd be no way to identify the tuple in the replica. (Or, if the pgouput code
> disobeys the column list and sends the replica identity even if it's not in the
> column list, then you'd be potentially publishing data that you wanted to hide.)

Thanks for the explanation.

Apart from ALTER REPLICA IDENTITY and DROP INDEX, I think there could be
some other cases we need to handle for the replica identity check:

1)
When adding a partitioned table with column list to the publication, I think we
need to check the RI of all its leaf partition. Because the RI on the partition
is the one actually takes effect.

2)
ALTER TABLE ADD PRIMARY KEY;
ALTER TABLE DROP CONSTRAINT "PRIMAEY KEY";

If the replica identity is default, it will use the primary key. we might also
need to prevent user from adding or removing primary key in this case.

Based on the above cases, the RI check seems could bring considerable amount of
code. So, how about we follow what we already did in CheckCmdReplicaIdentity(),
we can put the check for RI in that function, so that we can cover all the
cases and reduce the code change. And if we are worried about the cost of do
the check for UPDATE and DELETE every time, we can also save the result in the
relcache. It's safe because every operation change the RI will invalidate the
relcache. We are using this approach in row filter patch to make sure all
columns in row filter expression are part of RI.

Best regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2021-12-17 09:53:48 Re: Skipping logical replication transactions on subscriber side
Previous Message Peter Eisentraut 2021-12-17 09:21:04 Re: Addition of --no-sync to pg_upgrade for test speedup