RE: row filtering for logical replication

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: row filtering for logical replication
Date: 2022-01-13 13:21:22
Message-ID: OS0PR01MB5716121D296D75AFA84ADCB794539@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, January 13, 2022 2:49 PM Peter Smith <smithpb2250(at)gmail(dot)com>
> Thanks for posting the merged v63.
>
> Here are my review comments for the v63-0001 changes.
>
> ~~~

Thanks for the comments!

> 1. src/backend/replication/logical/proto.c - logicalrep_write_tuple
>
> TupleDesc desc;
> - Datum values[MaxTupleAttributeNumber];
> - bool isnull[MaxTupleAttributeNumber];
> + Datum *values;
> + bool *isnull;
> int i;
> uint16 nliveatts = 0;
>
> Those separate declarations for values / isnull are not strictly
> needed anymore, so those vars could be deleted. IIRC those were only
> added before when there were both slots and tuples. OTOH, maybe you
> prefer to keep it this way just for code readability?

Yes, I prefer the current style for code readability.

>
> 2. src/backend/replication/pgoutput/pgoutput.c - typedef
>
> +typedef enum RowFilterPubAction
> +{
> + PUBACTION_INSERT,
> + PUBACTION_UPDATE,
> + PUBACTION_DELETE,
> + NUM_ROWFILTER_PUBACTIONS /* must be last */
> +} RowFilterPubAction;
>
> This typedef is not currently used by any of the code.
>
> So I think choices are:
>
> - Option 1: remove the typedef, because nobody is using it.
>
> - Option 2: keep the typedef, but use it! e.g. everywhere there is an
> exprstate array index variable probably it should be declared as a
> 'RowFilterPubAction idx' instead of just 'int idx'.

Thanks, I used the option 1.

>
> 3. src/backend/replication/pgoutput/pgoutput.c - map_changetype_pubaction
>
> After this recent v63 refactoring and merging of some APIs it seems
> that the map_changetype_pubaction is now ONLY used by
> pgoutput_row_filter function. So this can now be a static member of
> pgoutput_row_filter function instead of being declared at file scope.
>

Changed.

>
> 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> comments
> The function header comment says the same thing 2x about the return values.
>

Changed.

>
> ~~~
>
> 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
>
> + ExprContext *ecxt;
> + int filter_index = map_changetype_pubaction[*action];
> +
> + /* *action is already assigned default by caller */
> + Assert(*action == REORDER_BUFFER_CHANGE_INSERT ||
> + *action == REORDER_BUFFER_CHANGE_UPDATE ||
> + *action == REORDER_BUFFER_CHANGE_DELETE);
> +
>
> Accessing the map_changetype_pubaction array should be done *after* the
> Assert.
>
> ~~~

Changed.

>
> 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
>
> ExprState *filter_exprstate;
> ...
> filter_exprstate = entry->exprstate[map_changetype_pubaction[*action]];
>
> Please consider what way you think is best.

Changed as suggested.

>
> 7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
> There are several things not quite right with that comment:
> a. I thought now it should refer to "slots" instead of "tuples"
>
> Suggested replacement comment:

Changed but I prefer "tuple" which is easy to understand.

> ~~~
>
> 8. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
>
> + if (!new_slot || !old_slot)
> + {
> + ecxt->ecxt_scantuple = new_slot ? new_slot : old_slot;
> + result = pgoutput_row_filter_exec_expr(entry->exprstate[filter_index],
> + ecxt);
> +
> + FreeExecutorState(estate);
> + PopActiveSnapshot();
> +
> + return result;
> + }
> +
> + tmp_new_slot = new_slot;
> + slot_getallattrs(new_slot);
> + slot_getallattrs(old_slot);
>
> I think after this "if" condition then the INSERT, DELETE and simple
> UPDATE are already handled. So, the remainder of the code is for
> deciding what update transformation is needed etc.
>
> I think there should be some block comment somewhere here to make that
> more obvious.

Changed.
> ~~
>
> 9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter
>
> Many of the comments in this function are still referring to old/new
> "tuple". Now that all the params are slots instead of tuples maybe now
> all the comments should also refer to "slots" instead of "tuples".
> Please search all the comments - e.g. including all the "Case 1:" ...
> "Case 4:" comments.

I also think tuple still makes sense, so I didn’t change this.

>
> 10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init
>
> + int idx_ins = PUBACTION_INSERT;
> + int idx_upd = PUBACTION_UPDATE;
> + int idx_del = PUBACTION_DELETE;
>
> These variables are unnecessary now... They previously were added only
> as short synonyms because the other enum names were too verbose (e.g.
> REORDER_BUFFER_CHANGE_INSERT) but now that we have shorter enum
> names
> like PUBACTION_INSERT we can just use those names directly
>
Changed.

>
> 11. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter_init
>
> I felt that the code would seem more natural if the
> pgoutput_row_filter_init function came *before* the
> pgoutput_row_filter function in the source code.
>
Changed.

>
> 12. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change
>
> @@ -634,6 +1176,9 @@ pgoutput_change(LogicalDecodingContext *ctx,
> ReorderBufferTXN *txn,
> RelationSyncEntry *relentry;
> TransactionId xid = InvalidTransactionId;
> Relation ancestor = NULL;
> + ReorderBufferChangeType modified_action = change->action;
> + TupleTableSlot *old_slot = NULL;
> + TupleTableSlot *new_slot = NULL;
>
> It seemed a bit misleading to me to call this variable
> 'modified_action' since mostly it is not modified at all.
>
> IMO it is better just to call this as 'action' but then add a comment
> (above the "switch (modified_action)") to say the previous call to
> pgoutput_row_filter may have transformed it to a different action.
>
Changed.

I have included these changes in v64 patch set.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-13 13:42:42 Re: Non-decimal integer literals
Previous Message Pavel Borisov 2022-01-13 13:21:06 Re: UNIQUE null treatment option