Re: row filtering for logical replication

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-23 11:28:21
Message-ID: CAFPTHDamz7R=eP1thsYuAGjhywU8y9dOOg1KUAjXC30Jr+C6xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attaching a new patchset v41 which includes changes by both Peter and myself.

Patches v40-0005 and v40-0006 have been merged to create patch
v41-0005 which reduces the patches to 6 again.
This patch-set contains changes addressing the following review comments:

On Mon, Nov 15, 2021 at 5:48 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> What I meant was that with this new code we have regressed the old
> behavior. Basically, imagine a case where no filter was given for any
> of the tables. Then after the patch, we will remove all the old tables
> whereas before the patch it will remove the oldrels only when they are
> not specified as part of new rels. If you agree with this, then we can
> retain the old behavior and for the new tables, we can always override
> the where clause for a SET variant of command.

Fixed and modified the behaviour to match with what the schema patch
implemented.

On Mon, Nov 15, 2021 at 9:31 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> 2.
> + * The OptWhereClause (row-filter) must be stored here
> + * but it is valid only for tables. If the ColId was
> + * mistakenly not a table this will be detected later
> + * in preprocess_pubobj_list() and an error thrown.
>
> /error thrown/error is thrown

Fixed.
:
> 6. In rowfilter_expr_checker(), the expression tree is traversed
> twice, can't we traverse it once to detect all non-allowed stuff? It
> can be sometimes costly to traverse the tree multiple times especially
> when the expression is complex and it doesn't seem acceptable to do so
> unless there is some genuine reason for the same.
>

Fixed.

On Tue, Nov 16, 2021 at 7:24 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> doc/src/sgml/ref/create_publication.sgml
> (1) improve comment
> + /* Set up a pstate to parse with */
>
> "pstate" is the variable name, better to use "ParseState".

Fixed.

> src/test/subscription/t/025_row_filter.pl
> (2) rename TAP test 025 to 026
> I suggest that the t/025_row_filter.pl TAP test should be renamed to
> 026 now because 025 is being used by some schema TAP test.
>

Fixed

On Tue, Nov 16, 2021 at 7:50 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> ---
> >If you choose to do the initial table synchronization, only data that satisfies
> >the row filters is sent.
>
> I think this comment is not correct, I think the correct statement
> would be "only data that satisfies the row filters is pulled by the
> subscriber"

Fixed

>
> I think this message is not correct, because for update also we can
> not have filters on the non-key attribute right? Even w.r.t the first
> patch also if the non update non key toast columns are there we can
> not apply filters on those. So this comment seems misleading to me.
>

Fixed

>
> - Oid relid = RelationGetRelid(targetrel->relation);
> ..
> + relid = RelationGetRelid(targetrel);
> +
>
> Why this change is required, I mean instead of fetching the relid
> during the variable declaration why do we need to do it separately
> now?
>

Fixed

> + if (expr == NULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_CANNOT_COERCE),
> + errmsg("row filter returns type %s that cannot be
> coerced to the expected type %s",
>
> Instead of "coerced to" can we use "cast to"? That will be in sync
> with other simmilar kind od user exposed error message.
> ----

Fixed

>
> I can see the caller of this function is already switching to
> CacheMemoryContext, so what is the point in doing it again here?
> Maybe if called is expected to do show we can Asssert on the
> CurrentMemoryContext.
>

Fixed.

On Thu, Nov 18, 2021 at 9:36 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> (2) missing test case
> It seems that the current tests are not testing the
> multiple-row-filter case (n_filters > 1) in the following code in
> pgoutput_row_filter_init():
>
> rfnode = n_filters > 1 ? makeBoolExpr(OR_EXPR, rfnodes, -1) :
> linitial(rfnodes);
>
> I think a test needs to be added similar to the customers+countries
> example that Tomas gave (where there is a single subscription to
> multiple publications of the same table, each of which has a
> row-filter).

Test case added.

On Fri, Nov 19, 2021 at 4:15 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> I notice that in the 0001 patch, it adds a "relid" member to the
> PublicationRelInfo struct:
>
> src/include/catalog/pg_publication.h
>
> typedef struct PublicationRelInfo
> {
> + Oid relid;
> Relation relation;
> + Node *whereClause;
> } PublicationRelInfo;
>
> It appears that this new member is not actually required, as the relid
> can be simply obtained from the existing "relation" member - using the
> RelationGetRelid() macro.

Fixed.

On Mon, Nov 22, 2021 at 12:44 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> Another thing I noticed was in the 0004 patch, list_free_deep() should
> be used instead of list_free() in the following code block, otherwise
> the rfnodes themselves (allocated by stringToNode()) are not freed:
>
> src/backend/replication/pgoutput/pgoutput.c
>
> + if (rfnodes)
> + {
> + list_free(rfnodes);
> + rfnodes = NIL;
> + }

Fixed.

We will be addressing the rest of the comments in the next patch.

regards,
Ajin Cherian
Fujitsu Australia

Attachment Content-Type Size
v41-0003-PS-ExprState-cache-modifications.patch application/octet-stream 13.2 KB
v41-0002-PS-Add-tab-auto-complete-support-for-the-Row-Fil.patch application/octet-stream 2.4 KB
v41-0005-PS-Row-filter-validation-walker.patch application/octet-stream 33.9 KB
v41-0004-PS-Combine-multiple-filters-with-OR-instead-of-A.patch application/octet-stream 14.0 KB
v41-0001-Row-filter-for-logical-replication.patch application/octet-stream 79.8 KB
v41-0006-Support-updates-based-on-old-and-new-tuple-in-ro.patch application/octet-stream 22.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-11-23 11:52:32 Re: row filtering for logical replication
Previous Message Michael Paquier 2021-11-23 10:33:56 Re: logical decoding/replication: new functions pg_ls_logicaldir and pg_ls_replslotdir