Re: row filtering for logical replication

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Ajin Cherian <itsajin(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2021-11-15 09:14:24
Message-ID: CAFiTN-tfYSMR+-z6tAugOwbOe2DwSm8gLsOcxMjGsN0KwQugmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 12, 2021 at 3:49 PM Ajin Cherian <itsajin(at)gmail(dot)com> wrote:
>
> Attaching version 39-
>

Some comments on 0006

--
/*
+ * Write UPDATE to the output stream using cached virtual slots.
+ * Cached updates will have both old tuple and new tuple.
+ */
+void
+logicalrep_write_update_cached(StringInfo out, TransactionId xid, Relation rel,
+ TupleTableSlot *oldtuple, TupleTableSlot *newtuple,
bool binary)
+{

Function, logicalrep_write_update_cached is exactly the same as
logicalrep_write_update, except calling logicalrep_write_tuple_cached
vs logicalrep_write_tuple. So I don't like the idea of making
complete duplicate copies. instead either we can keep a if check or we
can pass this logicalrep_write_tuple(_cached) as a function pointer.

--

Looking further, I realized that "logicalrep_write_tuple" and
"logicalrep_write_tuple_cached" are completely duplicate except first
one is calling "heap_deform_tuple" and then using local values[] array
and the second one is directly using the slot->values[] array, so in
fact we can pass this also as a parameter or we can put just one if
check the populate the values[] and null array, so if it is cached we
will point directly to the slot->values[] otherwise
heap_deform_tuple(), I think this should be just one simple check.
--
+
+/*
+ * Change is checked against the row filter, if any.
+ *
+ * If it returns true, the change is replicated, otherwise, it is not.
+ */
+static bool
+pgoutput_row_filter_virtual(Relation relation, TupleTableSlot *slot,
RelationSyncEntry *entry)

IMHO, the comments should explain how it is different from the
pgoutput_row_filter function. Also comments are saying "If it returns
true, the change is replicated, otherwise, it is not" which is not
exactly true for this function, I mean based on that the caller will
change the action. So I think it is enough to say what this function
is doing but not required to say what the caller will do based on what
this function returns.

--

+ for (i = 0; i < desc->natts; i++)
+ {
+ Form_pg_attribute att = TupleDescAttr(desc, i);
+
+ /* if the column in the new_tuple is null, nothing to do */
+ if (tmp_new_slot->tts_isnull[i])
+ continue;

Put some comments over this loop about what it is trying to do, and
overall I think there are not sufficient comments in the
pgoutput_row_filter_update_check function.

--
+ /*
+ * Unchanged toasted replica identity columns are
+ * only detoasted in the old tuple, copy this over to the newtuple.
+ */
+ if ((att->attlen == -1 &&
VARATT_IS_EXTERNAL_ONDISK(tmp_new_slot->tts_values[i])) &&
+ (!old_slot->tts_isnull[i] &&
+ !(VARATT_IS_EXTERNAL_ONDISK(old_slot->tts_values[i]))))

Is it ever possible that if the attribute is not NULL in the old slot
still it is stored as VARATT_IS_EXTERNAL_ONDISK? I think no, so
instead of adding
this last condition in check it should be asserted inside the if check.

--
static bool
-pgoutput_row_filter(PGOutputData *data, Relation relation, HeapTuple
oldtuple, HeapTuple newtuple, RelationSyncEntry *entry)
+pgoutput_row_filter_update_check(Relation relation, HeapTuple
oldtuple, HeapTuple newtuple, RelationSyncEntry *entry,
ReorderBufferChangeType *action)
+{

This function definition header is too long to fit in one line, so
better to break it. I think running will be a good idea.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-11-15 09:17:34 Re: Skipping logical replication transactions on subscriber side
Previous Message Daniel Gustafsson 2021-11-15 08:44:28 Re: Logical Replication - improve error message while adding tables to the publication in check_publication_add_relation