Re: row filtering for logical replication

From: Sascha Kuhl <yogidabanli(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, houzj(dot)fnst(at)fujitsu(dot)com, vignesh C <vignesh21(at)gmail(dot)com>, tanghy(dot)fnst(at)fujitsu(dot)com, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-12-04 07:21:47
Message-ID: CAPvVvKBNp2+wpkR4GpUQeYA7DmaVkSpW-2HH5t2m_9f4pbjw+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

This is great work thanks for the Realisation Update.

Euler Taveira <euler(at)eulerto(dot)com> schrieb am Sa., 4. Dez. 2021, 00:13:

> On Thu, Dec 2, 2021, at 4:18 AM, Peter Smith wrote:
>
> PSA a new v44* patch set.
>
> We are actively developing this feature for some months and we improved
> this
> feature a lot. This has been a good team work. It seems a good time to
> provide
> a retrospective for this feature based on the consensus we reached until
> now.
>
> The current design has one row filter per publication-table mapping. It
> allows
> flexible choices while using the same table for multiple replication
> purposes.
> The WHERE clause was chosen as the syntax to declare the row filter
> expression
> (enclosed by parentheses).
>
> There was a lot of discussion about which columns are allowed to use in
> the row
> filter expression. The consensus was that publications that publish UPDATE
> and/or DELETE operations, should check if the columns in the row filter
> expression is part of the replica identity. Otherwise, these DML operations
> couldn't be replicated.
>
> We also discussed about which expression would be allowed. We couldn't
> allow
> all kind of expressions because the way logical decoding infrastructure was
> designed, some expressions could break the replication. Hence, we decided
> to
> allow only "simple expressions". By "simple expression", we mean to
> restrict
> (a) user-defined objects (functions, operators, types) and (b) immutable
> builtin functions.
>
> A subscription can subscribe to multiple publications. These publication
> can
> publish the same table. In this case, we have to combine the row filter
> expression to decide if the row will be replicated or not. The consensus
> was to
> replicate a row if any of the row filters returns true. It means that if
> one
> publication-table mapping does not have a row filter, the row will be
> replicated. There is an optimization for this case that provides an empty
> expression for this table. Hence, it bails out and replicate the row
> without
> running the row filter code.
>
> The same logic applies to the initial table synchronization if there are
> multiple row filters. Copy all rows that satisfies at least one row filter
> expression. If the subscriber is a pre-15 version, data synchronization
> won't
> use row filters if they are defined in the publisher.
>
> If we are dealing with partitioned tables, the publication parameter
> publish_via_partition_root determines if it uses the partition row filter
> (false) or the root partitioned table row filter (true).
>
> I used the last patch series (v44) posted by Peter Smith [1]. I did a lot
> of
> improvements in this new version (v45). I merged 0001 (it is basically the
> main
> patch I wrote) and 0004 (autocomplete). As I explained in [2], I
> implemented a
> patch (that is incorporated in the v45-0001) to fix this issue. I saw that
> Peter already proposed a slightly different patch (0006). I read this
> patch and
> concludes that it would be better to keep the version I have. It fixes a
> few
> things and also includes more comments. I attached another patch (v45-0002)
> that includes the expression validation. It is based on 0002. I completely
> overhaul it. There are additional expressions that was not supported by the
> previous version (such as conditional expressions [CASE, COALESCE, NULLIF,
> GREATEST, LEAST], array operators, XML operators). I probably didn't
> finish the
> supported node list (there are a few primitive nodes that need to be
> checked).
> However, the current "simple expression" routine seems promising. I plan to
> integrate v45-0002 in the next patch version. I attached it here for
> comparison
> purposes only.
>
> My next step is to review 0003. As I said before it would like to treat it
> as a
> separate feature. I know that it is useful for data consistency but this
> patch
> is already too complex. Having said that, I didn't include it in this patch
> series because it doesn't apply cleanly. If Ajin would like to provide a
> new
> version, I would appreciate.
>
> PS> I will update the commit message in the next version. I barely changed
> the
> documentation to reflect the current behavior. I probably missed some
> changes
> but I will fix in the next version.
>
>
> [1]
> https://postgr.es/m/CAHut%2BPtJnnM8MYQDf7xCyFAp13U_0Ya2dv-UQeFD%3DghixFLZiw%40mail.gmail.com
> [2]
> https://postgr.es/m/ca8d270d-f930-4d15-9f24-60f95b364173%40www.fastmail.com
>
>
> --
> Euler Taveira
> EDB https://www.enterprisedb.com/
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message osumi.takamichi@fujitsu.com 2021-12-04 13:01:58 RE: Failed transaction statistics to measure the logical replication progress
Previous Message Thomas Munro 2021-12-04 05:18:32 Re: stat() vs ERROR_DELETE_PENDING, round N + 1