Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-01-24 08:37:35
Message-ID: CAHut+PsRTtXoYQiRqxwvyrcmkDMm-kR4GkvD9-nAqNrk4A3aCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for all the patches!

Here are my review comments for v69-0001

~~~

1. src/backend/executor/execReplication.c CheckCmdReplicaIdentity
call to RelationBuildPublicationDesc

+ /*
+ * It is only safe to execute UPDATE/DELETE when all columns, referenced in
+ * the row filters from publications which the relation is in, are valid -
+ * i.e. when all referenced columns are part of REPLICA IDENTITY or the
+ * table does not publish UPDATES or DELETES.
+ */
+ pubdesc = RelationBuildPublicationDesc(rel);

This code is leaky because never frees the palloc-ed memory for the pubdesc.

IMO change the RelationBuildPublicationDesc to pass in the
PublicationDesc* from the call stack then can eliminate the palloc and
risk of leaks.

~~~

2. src/include/utils/relcache.h - RelationBuildPublicationDesc

+struct PublicationDesc;
+extern struct PublicationDesc *RelationBuildPublicationDesc(Relation relation);

(Same as the previous comment #1). Suggest to change the function
signature to be void and pass the PublicationDesc* from stack instead
of palloc-ing it within the function

~~~

3. src/backend/utils/cache/relcache.c - RelationBuildPublicationDesc

+RelationBuildPublicationDesc(Relation relation)
{
List *puboids;
ListCell *lc;
MemoryContext oldcxt;
Oid schemaid;
- PublicationActions *pubactions = palloc0(sizeof(PublicationActions));
+ List *ancestors = NIL;
+ Oid relid = RelationGetRelid(relation);
+ AttrNumber invalid_rfcolnum = InvalidAttrNumber;
+ PublicationDesc *pubdesc = palloc0(sizeof(PublicationDesc));
+ PublicationActions *pubactions = &pubdesc->pubactions;
+
+ pubdesc->rf_valid_for_update = true;
+ pubdesc->rf_valid_for_delete = true;

IMO it wold be better to change the "sense" of those variables.
e.g.

"rf_valid_for_update" --> "rf_invalid_for_update"
"rf_valid_for_delete" --> "rf_invalid_for_delete"

That way they have the same 'sense' as the AttrNumbers so it all reads
better to me.

Also, it means no special assignment is needed because the palloc0
will set them correctly

~~~

4. src/backend/utils/cache/relcache.c - RelationBuildPublicationDesc

- if (relation->rd_pubactions)
+ if (relation->rd_pubdesc)
{
- pfree(relation->rd_pubactions);
- relation->rd_pubactions = NULL;
+ pfree(relation->rd_pubdesc);
+ relation->rd_pubdesc = NULL;
}

What is the purpose of this code? Can't it all just be removed?
e.g. Can't you Assert that relation->rd_pubdesc is NULL at this point?

(if it was not-null the function would have returned immediately from the top)

~~~

5. src/include/catalog/pg_publication.h - typedef struct PublicationDesc

+typedef struct PublicationDesc
+{
+ /*
+ * true if the columns referenced in row filters which are used for UPDATE
+ * or DELETE are part of the replica identity, or the publication actions
+ * do not include UPDATE or DELETE.
+ */
+ bool rf_valid_for_update;
+ bool rf_valid_for_delete;
+
+ AttrNumber invalid_rfcol_update;
+ AttrNumber invalid_rfcol_delete;
+
+ PublicationActions pubactions;
+} PublicationDesc;
+

I did not see any point really for the pairs of booleans and AttNumbers.
AFAIK both of them shared exactly the same validation logic so I think
you can get by using fewer members here.

e.g. (here I also reversed the sense of the bool flag, as per my suggestion #3)

typedef struct PublicationDesc
{
/*
* true if the columns referenced in row filters which are used for UPDATE
* or DELETE are part of the replica identity, or the publication actions
* do not include UPDATE or DELETE.
*/
bool rf_invalid_for_upd_del;
AttrNumber invalid_rfcol_upd_del;

PublicationActions pubactions;
} PublicationDesc;

~~~

6. src/tools/pgindent/typedefs.list

Missing the new typedef PublicationDesc

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-01-24 08:41:20 Re: Non-decimal integer literals
Previous Message Kyotaro Horiguchi 2022-01-24 08:19:48 Re: BufferAlloc: don't take two simultaneous locks