Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-15 01:01:37
Message-ID: CAHut+PsZ2xsRZw4AyRQuLfO4gYiqCpNVNDRbv_RN1XUUo3KWsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 12, 2021 at 9:19 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> Attaching version 39-

Here are some review comments for v39-0006:

1)
@@ -261,9 +261,9 @@ rowfilter_expr_replident_walker(Node *node,
rf_context *context)
* Rule 1. Walk the parse-tree and reject anything other than very simple
* expressions (See rowfilter_validator for details on what is permitted).
*
- * Rule 2. If the publish operation contains "delete" then only columns that
- * are allowed by the REPLICA IDENTITY rules are permitted to be used in the
- * row-filter WHERE clause.
+ * Rule 2. If the publish operation contains "delete" or "delete" then only
+ * columns that are allowed by the REPLICA IDENTITY rules are permitted to
+ * be used in the row-filter WHERE clause.
*/
static void
rowfilter_expr_checker(Publication *pub, ParseState *pstate, Node
*rfnode, Relation rel)
@@ -276,12 +276,10 @@ rowfilter_expr_checker(Publication *pub,
ParseState *pstate, Node *rfnode, Relat
rowfilter_validator(relname, rfnode);

/*
- * Rule 2: For "delete", check that filter cols are also valid replica
+ * Rule 2: For "delete" and "update", check that filter cols are also
valid replica
* identity cols.
- *
- * TODO - check later for publish "update" case.
*/
- if (pub->pubactions.pubdelete)

1a)
Typo - the function comment: "delete" or "delete"; should say:
"delete" or "update"

1b)
I felt it would be better (for the comment in the function body) to
write it as "or" instead of "and" because then it matches with the
code "if ||" that follows this comment.

====

2)
@@ -746,6 +780,92 @@ logicalrep_read_typ(StringInfo in, LogicalRepTyp *ltyp)
}

/*
+ * Write a tuple to the outputstream using cached slot, in the most
efficient format possible.
+ */
+static void
+logicalrep_write_tuple_cached(StringInfo out, Relation rel,
TupleTableSlot *slot, bool binary)

The function logicalrep_write_tuple_cached seems to have almost all of
its function body in common with logicalrep_write_tuple. Is there any
good way to combine these functions to avoid ~80 lines mostly
duplicated code?

====

3)
+ if (!old_matched && !new_matched)
+ return false;
+
+ if (old_matched && new_matched)
+ *action = REORDER_BUFFER_CHANGE_UPDATE;
+ else if (old_matched && !new_matched)
+ *action = REORDER_BUFFER_CHANGE_DELETE;
+ else if (new_matched && !old_matched)
+ *action = REORDER_BUFFER_CHANGE_INSERT;
+
+ return true;

I felt it is slightly confusing to have inconsistent ordering of the
old_matched and new_matched in those above conditions.

I suggest to use the order like:
* old-row (no match) new-row (no match)
* old-row (no match) new row (match)
* old-row (match) new-row (no match)
* old-row (match) new row (match)

And then be sure to keep consistent ordering in all places it is mentioned:
* in the code
* in the function header comment
* in the commit comment
* in docs?

====

4)
+/*
+ * Change is checked against the row filter, if any.
+ *
+ * If it returns true, the change is replicated, otherwise, it is not.
+ */
+static bool
+pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot,
RelationSyncEntry *entry)
+{
+ EState *estate;
+ ExprContext *ecxt;
+ bool result = true;
+ Oid relid = RelationGetRelid(relation);
+
+ /* Bail out if there is no row filter */
+ if (!entry->exprstate)
+ return true;
+
+ elog(DEBUG3, "table \"%s.%s\" has row filter",
+ get_namespace_name(get_rel_namespace(relid)),
+ get_rel_name(relid));

It seems like that elog may consume unnecessary CPU most of the time.
I think it might be better to remove the relid declaration and rewrite
that elog as:

if (message_level_is_interesting(DEBUG3))
elog(DEBUG3, "table \"%s.%s\" has row filter",
get_namespace_name(get_rel_namespace(entry->relid)),
get_rel_name(entry->relid));

====

5)
diff --git a/src/include/replication/reorderbuffer.h
b/src/include/replication/reorderbuffer.h
index 5b40ff7..aec0059 100644
--- a/src/include/replication/reorderbuffer.h
+++ b/src/include/replication/reorderbuffer.h
@@ -51,7 +51,7 @@ typedef struct ReorderBufferTupleBuf
* respectively. They're used by INSERT .. ON CONFLICT .. UPDATE. Users of
* logical decoding don't have to care about these.
*/
-enum ReorderBufferChangeType
+typedef enum ReorderBufferChangeType
{
REORDER_BUFFER_CHANGE_INSERT,
REORDER_BUFFER_CHANGE_UPDATE,
@@ -65,7 +65,7 @@ enum ReorderBufferChangeType
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_CONFIRM,
REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT,
REORDER_BUFFER_CHANGE_TRUNCATE
-};
+} ReorderBufferChangeType;

This new typedef can be added to src/tools/pgindent/typedefs.list.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shinya Kato 2021-11-15 01:03:22 Re: Emit a warning if the extension's GUC is set incorrectly
Previous Message Tom Lane 2021-11-14 23:46:34 Re: JIT doing duplicative optimization?