Quick Links

Re: row filtering for logical replication

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	"houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: row filtering for logical replication
Date:	2022-01-10 06:37:25
Message-ID:	CAA4eK1+mxOUU=hZSv5xDd4WzTN5aV+u_3QVOh0iHXNxB6aLTMw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jan 10, 2022 at 8:41 AM houzj(dot)fnst(at)fujitsu(dot)com
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Attach the v61 patch set.
>

Few comments:
==============
1.
pgoutput_row_filter()
{
..
+
+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+ rfnode = n_filters > 1 ? makeBoolExpr(OR_EXPR, rfnodes[idx], -1) :
linitial(rfnodes[idx]);
+ entry->exprstate[idx] = pgoutput_row_filter_init_expr(rfnode);
+ MemoryContextSwitchTo(oldctx);
..
}

rel_sync_cache_relation_cb()
{
..
+ if (entry->exprstate[idx] != NULL)
+ {
+ pfree(entry->exprstate[idx]);
+ entry->exprstate[idx] = NULL;
+ }
..
}

I think this can leak memory as just freeing 'exprstate' is not
sufficient. It contains other allocated memory as well like for
'steps'. Apart from that we might allocate other memory as well for
generating expression state. I think it would be better if we can have
another memory context (say cache_expr_cxt) in RelationSyncEntry and
allocate it the first time we need it and then reset it instead of
doing pfree of 'exprstate'. Also, we can free this new context in
pgoutput_shutdown before destroying RelationSyncCache.

2. If we do the above, we can use this new context at all other places
in the patch where it is using CacheMemoryContext.

3.
@@ -1365,6 +1785,7 @@ rel_sync_cache_publication_cb(Datum arg, int
cacheid, uint32 hashvalue)
{
HASH_SEQ_STATUS status;
RelationSyncEntry *entry;
+ MemoryContext oldctx;

/*
* We can get here if the plugin was used in SQL interface as the
@@ -1374,6 +1795,8 @@ rel_sync_cache_publication_cb(Datum arg, int
cacheid, uint32 hashvalue)
if (RelationSyncCache == NULL)
return;

+ oldctx = MemoryContextSwitchTo(CacheMemoryContext);
+
/*
* There is no way to find which entry in our cache the hash belongs to so
* mark the whole cache as invalid.
@@ -1392,6 +1815,8 @@ rel_sync_cache_publication_cb(Datum arg, int
cacheid, uint32 hashvalue)
entry->pubactions.pubdelete = false;
entry->pubactions.pubtruncate = false;
}
+
+ MemoryContextSwitchTo(oldctx);
}

Is there a reason for the above change?

4.
+#define SET_NO_FILTER_FOR_CURRENT_PUBACTIONS \
+ if (pub->pubactions.pubinsert) \
+ no_filter[idx_ins] = true; \
+ if (pub->pubactions.pubupdate) \
+ no_filter[idx_upd] = true; \
+ if (pub->pubactions.pubdelete) \
+ no_filter[idx_del] = true

I don't see the need for this macro and it makes code less readable. I
think we can instead move this code to a function to avoid duplicate
code.

5.
Multiple publications might have multiple row filters for
+ * this relation. Since row filter usage depends on the DML operation,
+ * there are multiple lists (one for each operation) which row filters
+ * will be appended.

There seems to be a typo in the above sentence.
/which row filters/to which row filters

6.
+ /*
+ * Find if there are any row filters for this relation. If there are,
+ * then prepare the necessary ExprState and cache it in
+ * entry->exprstate.
+ *
+ * NOTE: All publication-table mappings must be checked.
+ *
+ * NOTE: If the relation is a partition and pubviaroot is true, use
+ * the row filter of the topmost partitioned table instead of the row
+ * filter of its own partition.
+ *
+ * NOTE: Multiple publications might have multiple row filters for
+ * this relation. Since row filter usage depends on the DML operation,
+ * there are multiple lists (one for each operation) which row filters
+ * will be appended.
+ *
+ * NOTE: FOR ALL TABLES implies "don't use row filter expression" so
+ * it takes precedence.
+ *
+ * NOTE: ALL TABLES IN SCHEMA implies "don't use row filter
+ * expression" if the schema is the same as the table schema.
+ */
+ foreach(lc, data->publications)

Let's not add NOTE for each of these points but instead expand the
first sentence as "Find if there are any row filters for this
relation. If there are, then prepare the necessary ExprState and cache
it in entry->exprstate. To build an expression state, we need to
ensure the following:"

--
With Regards,
Amit Kapila.

In response to

RE: row filtering for logical replication at 2022-01-10 03:11:30 from houzj.fnst@fujitsu.com

Responses

Re: row filtering for logical replication at 2022-01-11 01:22:31 from Peter Smith
RE: row filtering for logical replication at 2022-01-11 02:16:05 from houzj.fnst@fujitsu.com

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2022-01-10 06:39:58	Re: pg_upgrade verbosity when redirecting output to log file
Previous Message	Tom Lane	2022-01-10 06:14:32	Re: pg_upgrade verbosity when redirecting output to log file