RE: row filtering for logical replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: row filtering for logical replication
Date: 2022-01-11 02:16:05
Message-ID: OS0PR01MB57164BD1F23A83C1601880A994519@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 10, 2022 2:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Jan 10, 2022 at 8:41 AM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > Attach the v61 patch set.
> >
>
> Few comments:
> ==============
> 1.
> pgoutput_row_filter()
> {
> ..
> +
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> + rfnode = n_filters > 1 ? makeBoolExpr(OR_EXPR, rfnodes[idx], -1) :
> linitial(rfnodes[idx]);
> + entry->exprstate[idx] = pgoutput_row_filter_init_expr(rfnode);
> + MemoryContextSwitchTo(oldctx);
> ..
> }
>
> rel_sync_cache_relation_cb()
> {
> ..
> + if (entry->exprstate[idx] != NULL)
> + {
> + pfree(entry->exprstate[idx]);
> + entry->exprstate[idx] = NULL;
> + }
> ..
> }
>
> I think this can leak memory as just freeing 'exprstate' is not
> sufficient. It contains other allocated memory as well like for
> 'steps'. Apart from that we might allocate other memory as well for
> generating expression state. I think it would be better if we can have
> another memory context (say cache_expr_cxt) in RelationSyncEntry and
> allocate it the first time we need it and then reset it instead of
> doing pfree of 'exprstate'. Also, we can free this new context in
> pgoutput_shutdown before destroying RelationSyncCache.
> 2. If we do the above, we can use this new context at all other places
> in the patch where it is using CacheMemoryContext.

Changed.

> 3.
> @@ -1365,6 +1785,7 @@ rel_sync_cache_publication_cb(Datum arg, int
> cacheid, uint32 hashvalue)
> {
> HASH_SEQ_STATUS status;
> RelationSyncEntry *entry;
> + MemoryContext oldctx;
>
> /*
> * We can get here if the plugin was used in SQL interface as the
> @@ -1374,6 +1795,8 @@ rel_sync_cache_publication_cb(Datum arg, int
> cacheid, uint32 hashvalue)
> if (RelationSyncCache == NULL)
> return;
>
> + oldctx = MemoryContextSwitchTo(CacheMemoryContext);
> +
> /*
> * There is no way to find which entry in our cache the hash belongs to so
> * mark the whole cache as invalid.
> @@ -1392,6 +1815,8 @@ rel_sync_cache_publication_cb(Datum arg, int
> cacheid, uint32 hashvalue)
> entry->pubactions.pubdelete = false;
> entry->pubactions.pubtruncate = false;
> }
> +
> + MemoryContextSwitchTo(oldctx);
> }
>
> Is there a reason for the above change?

Reverted this change.

> 4.
> +#define SET_NO_FILTER_FOR_CURRENT_PUBACTIONS \
> + if (pub->pubactions.pubinsert) \
> + no_filter[idx_ins] = true; \
> + if (pub->pubactions.pubupdate) \
> + no_filter[idx_upd] = true; \
> + if (pub->pubactions.pubdelete) \
> + no_filter[idx_del] = true
>
> I don't see the need for this macro and it makes code less readable. I
> think we can instead move this code to a function to avoid duplicate
> code.

I slightly refactor the code in this function to avoid duplicate code.

> 5.
> Multiple publications might have multiple row filters for
> + * this relation. Since row filter usage depends on the DML operation,
> + * there are multiple lists (one for each operation) which row filters
> + * will be appended.
>
> There seems to be a typo in the above sentence.
> /which row filters/to which row filters

Changed.

> 6.
> + /*
> + * Find if there are any row filters for this relation. If there are,
> + * then prepare the necessary ExprState and cache it in
> + * entry->exprstate.
> + *
> + * NOTE: All publication-table mappings must be checked.
> + *
> + * NOTE: If the relation is a partition and pubviaroot is true, use
> + * the row filter of the topmost partitioned table instead of the row
> + * filter of its own partition.
> + *
> + * NOTE: Multiple publications might have multiple row filters for
> + * this relation. Since row filter usage depends on the DML operation,
> + * there are multiple lists (one for each operation) which row filters
> + * will be appended.
> + *
> + * NOTE: FOR ALL TABLES implies "don't use row filter expression" so
> + * it takes precedence.
> + *
> + * NOTE: ALL TABLES IN SCHEMA implies "don't use row filter
> + * expression" if the schema is the same as the table schema.
> + */
> + foreach(lc, data->publications)
>
> Let's not add NOTE for each of these points but instead expand the
> first sentence as "Find if there are any row filters for this
> relation. If there are, then prepare the necessary ExprState and cache
> it in entry->exprstate. To build an expression state, we need to
> ensure the following:"

Changed.

Attach the v62 patch set which address the above comments and slightly
adjust the commit message in 0002 patch.

Best regards,
Hou zj

Attachment Content-Type Size
v62-0001-Row-filter-for-logical-replication.patch application/octet-stream 130.4 KB
v62-0002-Row-filter-updates-based-on-old-new-tuples.patch application/octet-stream 43.3 KB
v62-0003-Row-filter-tab-auto-complete-and-pgdump.patch application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-01-11 02:21:57 Re: Skipping logical replication transactions on subscriber side
Previous Message Tatsuro Yamada 2022-01-11 01:57:29 Re: \dP and \dX use ::regclass without "pg_catalog."