RE: row filtering for logical replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: row filtering for logical replication
Date: 2021-11-30 01:39:05
Message-ID: OS0PR01MB57166F44723AB76866F74A0694679@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 29, 2021 6:11 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Nov 29, 2021 at 12:10 PM Greg Nancarrow <gregn4422(at)gmail(dot)com>
> wrote:
> >
> > On Fri, Nov 26, 2021 at 12:40 AM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > When researching and writing a top-up patch about this.
> > > I found a possible issue which I'd like to confirm first.
> > >
> > > It's possible the table is published in two publications A and B,
> > > publication A only publish "insert" , publication B publish
> > > "update". When UPDATE, both row filter in A and B will be executed. Is this
> behavior expected?
> > >
> > > For example:
> > > ---- Publication
> > > create table tbl1 (a int primary key, b int); create publication A
> > > for table tbl1 where (b<2) with(publish='insert'); create
> > > publication B for table tbl1 where (a>1) with(publish='update');
> > >
> > > ---- Subscription
> > > create table tbl1 (a int primary key); CREATE SUBSCRIPTION sub
> > > CONNECTION 'dbname=postgres host=localhost port=10000'
> PUBLICATION
> > > A,B;
> > >
> > > ---- Publication
> > > update tbl1 set a = 2;
> > >
> > > The publication can be created, and when UPDATE, the rowfilter in A
> > > (b<2) will also been executed but the column in it is not part of replica
> identity.
> > > (I am not against this behavior just confirm)
> > >
> >
> > There seems to be problems related to allowing the row filter to
> > include columns that are not part of the replica identity (in the case
> > of publish=insert).
> > In your example scenario, the tbl1 WHERE clause "(b < 2)" for
> > publication A, that publishes inserts only, causes a problem, because
> > column "b" is not part of the replica identity.
> > To see this, follow the simple example below:
> > (and note, for the Subscription, the provided tbl1 definition has an
> > error, it should also include the 2nd column "b int", same as in the
> > publisher)
> >
> > ---- Publisher:
> > INSERT INTO tbl1 VALUES (1,1);
> > UPDATE tbl1 SET a = 2;
> >
> > Prior to the UPDATE above:
> > On pub side, tbl1 contains (1,1).
> > On sub side, tbl1 contains (1,1)
> >
> > After the above UPDATE:
> > On pub side, tbl1 contains (2,1).
> > On sub side, tbl1 contains (1,1), (2,1)
> >
> > So the UPDATE on the pub side has resulted in an INSERT of (2,1) on
> > the sub side.
> >
> > This is because when (1,1) is UPDATEd to (2,1), it attempts to use the
> > "insert" filter "(b<2)" to determine whether the old value had been
> > inserted (published to subscriber), but finds there is no "b" value
> > (because it only uses RI cols for UPDATE) and so has to assume the old
> > tuple doesn't exist on the subscriber, hence the UPDATE ends up doing
> > an INSERT.
> > INow if the use of RI cols were enforced for the insert filter case,
> > we'd properly know the answer as to whether the old row value had been
> > published and it would have correctly performed an UPDATE instead of
> > an INSERT in this case.
> >
>
> I don't think it is a good idea to combine the row-filter from the publication
> that publishes just 'insert' with the row-filter that publishes 'updates'. We
> shouldn't apply the 'insert' filter for 'update' and similarly for publication
> operations. We can combine the filters when the published operations are the
> same. So, this means that we might need to cache multiple row-filters but I
> think that is better than having another restriction that publish operation
> 'insert'
> should also honor RI columns restriction.

Personally, I agreed that an UPDATE operation should only apply a row filter that
is part of a publication that has only UPDATE.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-11-30 02:09:28 Re: Fix typos
Previous Message Greg Nancarrow 2021-11-30 01:15:32 Re: Correct handling of blank/commented lines in PSQL interactive-mode history