Re: row filtering for logical replication

From: Greg Nancarrow <gregn4422(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-17 11:59:28
Message-ID: CAJcOf-fi0oSC69ZN2f7cVp9S6uFN1itHaJwf4dNoS7mDbWnt_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 17, 2021 at 7:20 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Fri, Dec 17, 2021 at 5:46 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> > So using the v47 patch-set, I still find that the UPDATE above results in publication of an INSERT of (2,1), rather than an UPDATE of (1,1) to (2,1).
> > This is according to the 2nd UPDATE rule below, from patch 0003.
> >
> > + * old-row (no match) new-row (no match) -> (drop change)
> > + * old-row (no match) new row (match) -> INSERT
> > + * old-row (match) new-row (no match) -> DELETE
> > + * old-row (match) new row (match) -> UPDATE
> >
> > This is because the old row (1,1) doesn't match the UPDATE filter "(a>1)", but the new row (2,1) does.
> > This functionality doesn't seem right to me. I don't think it can be assumed that (1,1) was never published (and thus requires an INSERT rather than UPDATE) based on these checks, because in this example, (1,1) was previously published via a different operation - INSERT (and using a different filter too).
> > I think the fundamental problem here is that these UPDATE rules assume that the old (current) row was previously UPDATEd (and published, or not published, according to the filter applicable to UPDATE), but this is not necessarily the case.
> > Or am I missing something?
>
> But it need not be correct in assuming that the old-row was part of a
> previous INSERT either (and published, or not published according to
> the filter applicable to an INSERT).
> For example, change the sequence of inserts and updates prior to the
> last update:
>
> truncate tbl1 ;
> insert into tbl1 values (1,5); ==> not replicated since insert and ! (b < 2);
> update tbl1 set b = 1; ==> not replicated since update and ! (a > 1)
> update tbl1 set a = 2; ==> replicated and update converted to insert
> since (a > 1)
>
> In this case, the last update "update tbl1 set a = 2; " is updating a
> row that was previously updated and not inserted and not replicated to
> the subscriber.
> How does the replication logic differentiate between these two cases,
> and decide if the update was previously published or not?
> I think it's futile for the publisher side to try and figure out the
> history of published rows. In fact, if this level of logic is required
> then it is best implemented on the subscriber side, which then defeats
> the purpose of a publication filter.
>

I think it's a concern, for such a basic example with only one row,
getting unpredictable (and even wrong) replication results, depending
upon the order of operations.
Doesn't this problem result from allowing different WHERE clauses for
different pubactions for the same table?
My current thoughts are that this shouldn't be allowed, and also WHERE
clauses for INSERTs should, like UPDATE and DELETE, be restricted to
using only columns covered by the replica identity or primary key.

Regards,
Greg Nancarrow
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2021-12-17 12:09:37 Re: Column Filtering in Logical Replication
Previous Message Peter Eisentraut 2021-12-17 11:34:36 Re: Adding CI to our tree