Re: row filtering for logical replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ajin Cherian <itsajin(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-09-24 21:37:20
Message-ID: 4b213b91-8428-85a7-7ed6-48ceefd7f7be@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 9/24/21 8:09 AM, Amit Kapila wrote:
> On Thu, Sep 23, 2021 at 6:03 PM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> 13) turning update into insert
>>
>> I agree with Ajin Cherian [4] that looking at just old or new row for
>> updates is not the right solution, because each option will "break" the
>> replica in some case. So I think the goal "keeping the replica in sync"
>> is the right perspective, and converting the update to insert/delete if
>> needed seems appropriate.
>>
>> This seems a somewhat similar to what pglogical does, because that may
>> also convert updates (although only to inserts, IIRC) when handling
>> replication conflicts. The difference is pglogical does all this on the
>> subscriber, while this makes the decision on the publisher.
>>
>> I wonder if this might have some negative consequences, or whether
>> "moving" this to downstream would be useful for other purposes in the
>> fuure (e.g. it might be reused for handling other conflicts).
>>
>
> Apart from additional traffic, I am not sure how will we handle all
> the conditions on subscribers, say if the new row doesn't match, how
> will subscribers know about this unless we pass row_filter or some
> additional information along with tuple. Previously, I have done some
> research and shared in one of the emails above that IBM's InfoSphere
> Data Replication [1] performs filtering in this way which also
> suggests that we won't be off here.
>

I'm certainly not suggesting what we're doing is wrong. Given the design
of built-in logical replication it makes sense doing it this way, I was
just thinking aloud about what we might want to do in the future (e.g.
pglogical uses this to deal with conflicts between multiple sources, and
so on).

>>
>>
>> 15) pgoutput_row_filter initializing filter
>>
>> I'm not sure I understand why the filter initialization gets moved from
>> get_rel_sync_entry. Presumably, most of what the replication does is
>> replicating rows, so I see little point in not initializing this along
>> with the rest of the rel_sync_entry.
>>
>
> Sorry, IIRC, this has been suggested by me and I thought it was best
> to do any expensive computation the first time it is required. I have
> shared few cases like in [2] where it would lead to additional cost
> without any gain. Unless I am missing something, I don't see any
> downside of doing it in a delayed fashion.
>

Not sure, but the arguments presented there seem a bit wonky ...

Yes, the work would be wasted if we discard the cached data without
using it (it might happen for truncate, I'm not sure). But how likely is
it that such operations happen *in isolation*? I'd bet the workload is
almost never just a stream of truncates - there are always some
operations in between that would actually use this.

Similarly for the errors - IIRC hitting an error means the replication
restarts, which is orders of magnitude more expensive than anything we
can save by this delayed evaluation.

I'd keep it simple, for the sake of simplicity of the whole patch.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-09-24 21:37:48 Re: Proposal: Save user's original authenticated identity for logging
Previous Message Tomas Vondra 2021-09-24 20:54:40 Re: Column Filtering in Logical Replication