Re: row filtering for logical replication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-20 00:37:15
Message-ID: CAJcOf-eH5PjuzSoxxOp3SX58bDFFE7=uvokhOPLCXP3Cqp5fyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 18, 2021 at 1:33 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> >
> > I think it's a concern, for such a basic example with only one row,
> > getting unpredictable (and even wrong) replication results, depending
> > upon the order of operations.
> >
>
> I am not sure how we can deduce that. The results are based on current
> and new values of row which is what I think we are expecting here.
>

In the two simple cases presented, the publisher ends up with the same
single row (2,1) in both cases, but in one of the cases the subscriber
ends up with an extra row (1,1) that the publisher doesn't have. So,
in using a "filter", a new row has been published that the publisher
doesn't have. I'm not so sure a user would be expecting that. Not to
mention that if (1,1) is subsequently INSERTed on the publisher side,
it will result in a duplicate key error on the publisher.

> > Doesn't this problem result from allowing different WHERE clauses for
> > different pubactions for the same table?
> > My current thoughts are that this shouldn't be allowed, and also WHERE
> > clauses for INSERTs should, like UPDATE and DELETE, be restricted to
> > using only columns covered by the replica identity or primary key.
> >
>
> Hmm, even if we do that one could have removed the insert row filter
> by the time we are evaluating the update. So, we will get the same
> result. I think the behavior in your example is as we expect as per
> the specs defined by the patch and I don't see any problem, in this
> case, w.r.t replication results. Let us see what others think on this?
>

Here I'm talking about the typical use-case of setting the
row-filtering WHERE clause up-front and not changing it thereafter.
I think that dynamically changing filters after INSERT/UPDATE/DELETE
operations is not the typical use-case, and IMHO it's another thing
entirely (could result in all kinds of unpredictable, random results).

Personally I think it would make more sense to:
1) Disallow different WHERE clauses on the same table, for different pubactions.
2) If only INSERTs are being published, allow any column in the WHERE
clause, otherwise (as for UPDATE and DELETE) restrict the referenced
columns to be part of the replica identity or primary key.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2021-12-20 01:51:00 RE: row filtering for logical replication
Previous Message Peter Smith 2021-12-20 00:18:41 PublicationActions - use bit flags.