Re: row filtering for logical replication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-29 06:40:19
Message-ID: CAJcOf-dz0srExG0NPPgXh5X8eL2uxk7C=cZoGTbf8cNqoRUY6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 26, 2021 at 12:40 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> When researching and writing a top-up patch about this.
> I found a possible issue which I'd like to confirm first.
>
> It's possible the table is published in two publications A and B, publication A
> only publish "insert" , publication B publish "update". When UPDATE, both row
> filter in A and B will be executed. Is this behavior expected?
>
> For example:
> ---- Publication
> create table tbl1 (a int primary key, b int);
> create publication A for table tbl1 where (b<2) with(publish='insert');
> create publication B for table tbl1 where (a>1) with(publish='update');
>
> ---- Subscription
> create table tbl1 (a int primary key);
> CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres host=localhost
> port=10000' PUBLICATION A,B;
>
> ---- Publication
> update tbl1 set a = 2;
>
> The publication can be created, and when UPDATE, the rowfilter in A (b<2) will
> also been executed but the column in it is not part of replica identity.
> (I am not against this behavior just confirm)
>

There seems to be problems related to allowing the row filter to
include columns that are not part of the replica identity (in the case
of publish=insert).
In your example scenario, the tbl1 WHERE clause "(b < 2)" for
publication A, that publishes inserts only, causes a problem, because
column "b" is not part of the replica identity.
To see this, follow the simple example below:
(and note, for the Subscription, the provided tbl1 definition has an
error, it should also include the 2nd column "b int", same as in the
publisher)

---- Publisher:
INSERT INTO tbl1 VALUES (1,1);
UPDATE tbl1 SET a = 2;

Prior to the UPDATE above:
On pub side, tbl1 contains (1,1).
On sub side, tbl1 contains (1,1)

After the above UPDATE:
On pub side, tbl1 contains (2,1).
On sub side, tbl1 contains (1,1), (2,1)

So the UPDATE on the pub side has resulted in an INSERT of (2,1) on
the sub side.

This is because when (1,1) is UPDATEd to (2,1), it attempts to use the
"insert" filter "(b<2)" to determine whether the old value had been
inserted (published to subscriber), but finds there is no "b" value
(because it only uses RI cols for UPDATE) and so has to assume the old
tuple doesn't exist on the subscriber, hence the UPDATE ends up doing
an INSERT.
INow if the use of RI cols were enforced for the insert filter case,
we'd properly know the answer as to whether the old row value had been
published and it would have correctly performed an UPDATE instead of
an INSERT in this case.
Thoughts?

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-11-29 06:49:00 Re: Synchronizing slots from primary to standby
Previous Message Bharath Rupireddy 2021-11-29 06:18:50 Re: Rationalizing declarations of src/common/ variables