Re: row filtering for logical replication

From: Ajin Cherian <itsajin(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-30 07:03:40
Message-ID: CAFPTHDa9sB6MXZ0erDnjLX-rMfAf1UR9Kzr8yqDx38VJ5j_sMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 25, 2021 at 2:22 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Thanks for all the review comments so far! We are endeavouring to keep
> pace with them.
>
> All feedback is being tracked and we will fix and/or reply to everything ASAP.
>
> Meanwhile, PSA the latest set of v42* patches.
>
> This version was mostly a patch restructuring exercise but it also
> addresses some minor review comments in passing.
>

Addressed more review comments, in the attached patch-set v43. 5
patches carried forward from v42.
This patch-set contains the following fixes:

On Tue, Nov 23, 2021 at 1:28 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> in pgoutput_row_filter, we are dropping the slots if there are some
> old slots in the RelationSyncEntry. But then I noticed that in
> rel_sync_cache_relation_cb(), also we are doing that but only for the
> scantuple slot. So IMHO, rel_sync_cache_relation_cb(), is only place
> setting entry->rowfilter_valid to false; so why not drop all the slot
> that time only and in pgoutput_row_filter(), you can just put an
> assert?
>

Moved all the dropping of slots to rel_sync_cache_relation_cb()

> +static bool
> +pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot,
> RelationSyncEntry *entry)
> +{
> + EState *estate;
> + ExprContext *ecxt;
>
>
> pgoutput_row_filter_virtual and pgoutput_row_filter are exactly same
> except, ExecStoreHeapTuple(), so why not just put one check based on
> whether a slot is passed or not, instead of making complete duplicate
> copy of the function.

Removed pgoutput_row_filter_virtual

> oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> tupdesc = CreateTupleDescCopy(tupdesc);
> entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
>
> Why do we need to copy the tupledesc? do we think that we need to have
> this slot even if we close the relation, if so can you add the
> comments explaining why we are making a copy here.

This code has been modified, and comments added.

On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> One more thing related to this code:
> pgoutput_row_filter()
> {
> ..
> + if (!entry->rowfilter_valid)
> {
> ..
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> + tupdesc = CreateTupleDescCopy(tupdesc);
> + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
> + MemoryContextSwitchTo(oldctx);
> ..
> }
>
> Why do we need to initialize scantuple here unless we are sure that
> the row filter is going to get associated with this relentry? I think
> when there is no row filter then this allocation is not required.
>

Modified as suggested.

On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> In 0003 patch, why is below change required?
> --- a/src/backend/replication/pgoutput/pgoutput.c
> +++ b/src/backend/replication/pgoutput/pgoutput.c
> @@ -1,4 +1,4 @@
> -/*-------------------------------------------------------------------------
> +/*------------------------------------------------------------------------
> *
> * pgoutput.c
>

Removed.

>
> After above, rearrange the code in pgoutput_row_filter(), so that two
> different checks related to 'rfisnull' (introduced by different
> patches) can be combined as if .. else check.
>
Fixed.

On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> + * If the new relation or the old relation has a where clause,
> + * we need to remove it so that it can be added afresh later.
> + */
> + if (RelationGetRelid(newpubrel->relation) == oldrelid &&
> + newpubrel->whereClause == NULL && rfisnull)
>
> Can't we use _equalPublicationTable() here? It compares the whereClause as well.
>

Tried this, can't do this because one is an alter statement while the
other is a publication, the whereclause is not
the same Nodetype. In the statement, the whereclause is T_A_Expr,
while in the publication
catalog, it is T_OpExpr.

> /* Must be owner of the table or superuser. */
> - if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
> + if (!pg_class_ownercheck(relid, GetUserId()))
>
> Here, you can directly use RelationGetRelid as was used in the
> previous code without using an additional variable.
>

Fixed.

> 2.
> +typedef struct {
> + Relation rel;
> + bool check_replident;
> + Bitmapset *bms_replident;
> +}
> +rf_context;
>
> Add rf_context in the same line where } ends.

Code has been modified, this comment no longer applies.

> 4.
> + * Rules: Node-type validation
> + * ---------------------------
> + * Allow only simple or compound expressions like:
> + * - "(Var Op Const)" or
>
> It seems Var Op Var is allowed. I tried below and it works:
> create publication pub for table t1 where (c1 < c2) WITH (publish = 'insert');
>
> I think it should be okay to allow it provided we ensure that we never
> access some other table/view etc. as part of the expression. Also, we
> should document the behavior correctly.

Fixed.

On Wed, Nov 24, 2021 at 8:52 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> 4) This should be included in typedefs.list, also we could add some
> comments for this structure
> +typedef struct {
> + Relation rel;
> + Bitmapset *bms_replident;
> +}
> +rf_context;

this has been removed in last patch, so comment no longer applies

> 5) Few includes are not required. #include "miscadmin.h" not required
> in pg_publication.c, #include "executor/executor.h" not required in
> proto.c, #include "access/xact.h", #include "executor/executor.h" and
> #include "replication/logicalrelation.h" not required in pgoutput.c
>

Optimized this. removed "executor/executor.h" from patch 0003, removed
"access/xact.h" from patch 0001
removed "replication/logicalrelation.h” from 0001. Others required.

> 6) typo "filte" should be "filter":
> +/*
> + * The row filte walker checks that the row filter expression is legal.
> + *
> + * Rules: Node-type validation
> + * ---------------------------
> + * Allow only simple or compound expressions like:
> + * - "(Var Op Const)" or
> + * - "(Var Op Const) Bool (Var Op Const)"

Fixed.

regards,
Ajin Cherian
Fujitsu Australia

Attachment Content-Type Size
v43-0001-Row-filter-for-logical-replication.patch application/octet-stream 84.0 KB
v43-0003-Support-updates-based-on-old-and-new-tuple-in-ro.patch application/octet-stream 19.4 KB
v43-0002-PS-Row-filter-validation-walker.patch application/octet-stream 33.4 KB
v43-0005-cache-the-result-of-row-filter-column-validation.patch application/octet-stream 27.3 KB
v43-0004-Tab-auto-complete-and-pgdump-support-for-Row-Fil.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2021-11-30 07:37:57 Re: rand48 replacement
Previous Message Amul Sul 2021-11-30 07:00:41 Update stale code comment in CheckpointerMain()