Re: row filtering for logical replication

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-30 10:34:28
Message-ID: CALDaNm2T3yXJkuKXARUUh+=_36Ry7gYxUqhpgW8AxECug9nH6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 30, 2021 at 12:33 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> On Thu, Nov 25, 2021 at 2:22 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Thanks for all the review comments so far! We are endeavouring to keep
> > pace with them.
> >
> > All feedback is being tracked and we will fix and/or reply to everything ASAP.
> >
> > Meanwhile, PSA the latest set of v42* patches.
> >
> > This version was mostly a patch restructuring exercise but it also
> > addresses some minor review comments in passing.
> >
>
> Addressed more review comments, in the attached patch-set v43. 5
> patches carried forward from v42.
> This patch-set contains the following fixes:
>
> On Tue, Nov 23, 2021 at 1:28 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > in pgoutput_row_filter, we are dropping the slots if there are some
> > old slots in the RelationSyncEntry. But then I noticed that in
> > rel_sync_cache_relation_cb(), also we are doing that but only for the
> > scantuple slot. So IMHO, rel_sync_cache_relation_cb(), is only place
> > setting entry->rowfilter_valid to false; so why not drop all the slot
> > that time only and in pgoutput_row_filter(), you can just put an
> > assert?
> >
>
> Moved all the dropping of slots to rel_sync_cache_relation_cb()
>
> > +static bool
> > +pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot,
> > RelationSyncEntry *entry)
> > +{
> > + EState *estate;
> > + ExprContext *ecxt;
> >
> >
> > pgoutput_row_filter_virtual and pgoutput_row_filter are exactly same
> > except, ExecStoreHeapTuple(), so why not just put one check based on
> > whether a slot is passed or not, instead of making complete duplicate
> > copy of the function.
>
> Removed pgoutput_row_filter_virtual
>
> > oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > tupdesc = CreateTupleDescCopy(tupdesc);
> > entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
> >
> > Why do we need to copy the tupledesc? do we think that we need to have
> > this slot even if we close the relation, if so can you add the
> > comments explaining why we are making a copy here.
>
> This code has been modified, and comments added.
>
> On Tue, Nov 23, 2021 at 8:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > One more thing related to this code:
> > pgoutput_row_filter()
> > {
> > ..
> > + if (!entry->rowfilter_valid)
> > {
> > ..
> > + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> > + tupdesc = CreateTupleDescCopy(tupdesc);
> > + entry->scantuple = MakeSingleTupleTableSlot(tupdesc, &TTSOpsHeapTuple);
> > + MemoryContextSwitchTo(oldctx);
> > ..
> > }
> >
> > Why do we need to initialize scantuple here unless we are sure that
> > the row filter is going to get associated with this relentry? I think
> > when there is no row filter then this allocation is not required.
> >
>
> Modified as suggested.
>
> On Tue, Nov 23, 2021 at 10:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > In 0003 patch, why is below change required?
> > --- a/src/backend/replication/pgoutput/pgoutput.c
> > +++ b/src/backend/replication/pgoutput/pgoutput.c
> > @@ -1,4 +1,4 @@
> > -/*-------------------------------------------------------------------------
> > +/*------------------------------------------------------------------------
> > *
> > * pgoutput.c
> >
>
> Removed.
>
> >
> > After above, rearrange the code in pgoutput_row_filter(), so that two
> > different checks related to 'rfisnull' (introduced by different
> > patches) can be combined as if .. else check.
> >
> Fixed.
>
> On Thu, Nov 25, 2021 at 12:03 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > + * If the new relation or the old relation has a where clause,
> > + * we need to remove it so that it can be added afresh later.
> > + */
> > + if (RelationGetRelid(newpubrel->relation) == oldrelid &&
> > + newpubrel->whereClause == NULL && rfisnull)
> >
> > Can't we use _equalPublicationTable() here? It compares the whereClause as well.
> >
>
> Tried this, can't do this because one is an alter statement while the
> other is a publication, the whereclause is not
> the same Nodetype. In the statement, the whereclause is T_A_Expr,
> while in the publication
> catalog, it is T_OpExpr.
>
> > /* Must be owner of the table or superuser. */
> > - if (!pg_class_ownercheck(RelationGetRelid(rel), GetUserId()))
> > + if (!pg_class_ownercheck(relid, GetUserId()))
> >
> > Here, you can directly use RelationGetRelid as was used in the
> > previous code without using an additional variable.
> >
>
> Fixed.
>
> > 2.
> > +typedef struct {
> > + Relation rel;
> > + bool check_replident;
> > + Bitmapset *bms_replident;
> > +}
> > +rf_context;
> >
> > Add rf_context in the same line where } ends.
>
> Code has been modified, this comment no longer applies.
>
> > 4.
> > + * Rules: Node-type validation
> > + * ---------------------------
> > + * Allow only simple or compound expressions like:
> > + * - "(Var Op Const)" or
> >
> > It seems Var Op Var is allowed. I tried below and it works:
> > create publication pub for table t1 where (c1 < c2) WITH (publish = 'insert');
> >
> > I think it should be okay to allow it provided we ensure that we never
> > access some other table/view etc. as part of the expression. Also, we
> > should document the behavior correctly.
>
> Fixed.
>
> On Wed, Nov 24, 2021 at 8:52 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > 4) This should be included in typedefs.list, also we could add some
> > comments for this structure
> > +typedef struct {
> > + Relation rel;
> > + Bitmapset *bms_replident;
> > +}
> > +rf_context;
>
> this has been removed in last patch, so comment no longer applies
>
> > 5) Few includes are not required. #include "miscadmin.h" not required
> > in pg_publication.c, #include "executor/executor.h" not required in
> > proto.c, #include "access/xact.h", #include "executor/executor.h" and
> > #include "replication/logicalrelation.h" not required in pgoutput.c
> >
>
> Optimized this. removed "executor/executor.h" from patch 0003, removed
> "access/xact.h" from patch 0001
> removed "replication/logicalrelation.h” from 0001. Others required.
>
> > 6) typo "filte" should be "filter":
> > +/*
> > + * The row filte walker checks that the row filter expression is legal.
> > + *
> > + * Rules: Node-type validation
> > + * ---------------------------
> > + * Allow only simple or compound expressions like:
> > + * - "(Var Op Const)" or
> > + * - "(Var Op Const) Bool (Var Op Const)"
>
> Fixed.

Thanks for the updated patch, few comments:
1) Should this be changed to include non IMMUTABLE system functions
are not allowed:
+ not-null constraints in the <literal>WHERE</literal> clause. The
+ <literal>WHERE</literal> clause does not allow functions or user-defined
+ operators.
+ </para>

2) We can remove the #if 0 code if we don't plan to keep it in the final patch.
--- a/src/backend/parser/parse_agg.c
+++ b/src/backend/parser/parse_agg.c
@@ -552,11 +552,12 @@ check_agglevels_and_constraints(ParseState
*pstate, Node *expr)

break;
case EXPR_KIND_PUBLICATION_WHERE:
+#if 0
if (isAgg)
err = _("aggregate functions are not
allowed in publication WHERE expressions");
else
err = _("grouping operations are not
allowed in publication WHERE expressions");
-
+#endif

3) Can a user remove the row filter without removing the table from
the publication after creating the publication or should the user drop
the table and add the table in this case?

4) Should this be changed, since we error out if publisher without
replica identify performs delete or update:
+ The <literal>WHERE</literal> clause must contain only columns that are
+ covered by <literal>REPLICA IDENTITY</literal>, or are part of the primary
+ key (when <literal>REPLICA IDENTITY</literal> is not set), otherwise
+ <command>DELETE</command> or <command>UPDATE</command> operations will not
+ be replicated. That's because old row is used and it only contains primary
+ key or columns that are part of the <literal>REPLICA IDENTITY</literal>; the
+ remaining columns are <literal>NULL</literal>. For <command>INSERT</command>

to:
+ The <literal>WHERE</literal> clause must contain only columns that are
+ covered by <literal>REPLICA IDENTITY</literal>, or are part of the primary
+ key (when <literal>REPLICA IDENTITY</literal> is not set), otherwise
+ <command>DELETE</command> or <command>UPDATE</command> operations will be
+ disallowed on those tables. That's because old row is used and it
only contains primary
+ key or columns that are part of the <literal>REPLICA IDENTITY</literal>; the
+ remaining columns are <literal>NULL</literal>. For <command>INSERT</command>

Regards,
Vignesh

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-11-30 10:52:24 Re: SSL Tests for sslinfo extension
Previous Message Dilip Kumar 2021-11-30 10:28:22 Re: row filtering for logical replication