Re: row filtering for logical replication

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-09-24 06:09:31
Message-ID: CAA4eK1KrEFzFc42EvdNVpFRE9sWnQq1Gswpm9ewhKGy5vnrbUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> 13) turning update into insert
>
> I agree with Ajin Cherian [4] that looking at just old or new row for
> updates is not the right solution, because each option will "break" the
> replica in some case. So I think the goal "keeping the replica in sync"
> is the right perspective, and converting the update to insert/delete if
> needed seems appropriate.
>
> This seems a somewhat similar to what pglogical does, because that may
> also convert updates (although only to inserts, IIRC) when handling
> replication conflicts. The difference is pglogical does all this on the
> subscriber, while this makes the decision on the publisher.
>
> I wonder if this might have some negative consequences, or whether
> "moving" this to downstream would be useful for other purposes in the
> fuure (e.g. it might be reused for handling other conflicts).
>

Apart from additional traffic, I am not sure how will we handle all
the conditions on subscribers, say if the new row doesn't match, how
will subscribers know about this unless we pass row_filter or some
additional information along with tuple. Previously, I have done some
research and shared in one of the emails above that IBM's InfoSphere
Data Replication [1] performs filtering in this way which also
suggests that we won't be off here.

>
>
> 15) pgoutput_row_filter initializing filter
>
> I'm not sure I understand why the filter initialization gets moved from
> get_rel_sync_entry. Presumably, most of what the replication does is
> replicating rows, so I see little point in not initializing this along
> with the rest of the rel_sync_entry.
>

Sorry, IIRC, this has been suggested by me and I thought it was best
to do any expensive computation the first time it is required. I have
shared few cases like in [2] where it would lead to additional cost
without any gain. Unless I am missing something, I don't see any
downside of doing it in a delayed fashion.

[1] - https://www.ibm.com/docs/en/idr/11.4.0?topic=rows-search-conditions
[2] - https://www.postgresql.org/message-id/CAA4eK1JBHo2U2sZemFdJmcwEinByiJVii8wzGCDVMxOLYB3CUw%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-09-24 06:15:08 Re: Added schema level support for publication.
Previous Message vignesh C 2021-09-24 06:07:58 Re: Added schema level support for publication.