Re: row filtering for logical replication

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, Euler Taveira <euler(at)eulerto(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Ajin Cherian <itsajin(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, japin <japinli(at)hotmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: row filtering for logical replication
Date: 2022-01-12 05:16:43
Message-ID: CAHut+PucFM3Bt-gaTT7Pr-Y_x+R0y=L7uqbhjPMUsSPhdLhRpA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my review comments for v62-0001

~~~

1. src/backend/catalog/pg_publication.c - GetTopMostAncestorInPublication

@@ -276,17 +276,46 @@ GetPubPartitionOptionRelations(List *result,
PublicationPartOpt pub_partopt,
}

/*
+ * Check if any of the ancestors are published in the publication. If so,
+ * return the relid of the topmost ancestor that is published via this
+ * publication, otherwise InvalidOid.
+ */

The GetTopMostAncestorInPublication function header comment seems to
be saying the same thing twice. I think it can be simplified

Suggested function comment:

"Return the relid of the topmost ancestor that is published via this
publication, otherwise return InvalidOid."

~~~

2. src/backend/commands/publicationcmds.c - AlterPublicationTables

- /* Calculate which relations to drop. */
+ /*
+ * In order to recreate the relation list for the publication, look
+ * for existing relations that need not be dropped.
+ */

Suggested minor rewording of comment:

"... look for existing relations that do not need to be dropped."

~~~

3. src/backend/commands/publicationcmds.c - AlterPublicationTables

+
+ /*
+ * Look if any of the new set of relations match with the
+ * existing relations in the publication. Additionally, if the
+ * relation has an associated where-clause, check the
+ * where-clauses also match. Drop the rest.
+ */
if (RelationGetRelid(newpubrel->relation) == oldrelid)
Suggested minor rewording of comment:

"Look if any..." --> "Check if any..."

~~~

4. src/backend/executor/execReplication.c - CheckCmdReplicaIdentity

+ /*
+ * It is only safe to execute UPDATE/DELETE when all columns referenced in
+ * the row filters from publications which the relation is in are valid,
+ * which means all referenced columns are part of REPLICA IDENTITY, or the
+ * table does not publish UPDATES or DELETES.
+ */

Suggested minor rewording of comment:

"... in are valid, which means all ..." --> "... in are valid - i.e.
when all ..."

~~~

5. src/backend/parser/gram.y - ColId OptWhereClause

+ /*
+ * The OptWhereClause must be stored here but it is
+ * valid only for tables. If the ColId was mistakenly
+ * not a table this will be detected later in
+ * preprocess_pubobj_list() and an error is thrown.
+ */

Suggested minor rewording of comment:

"... and an error is thrown." --> "... and an error will be thrown."
~~~

6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter

+ * ALL TABLES IN SCHEMA implies "don't use row filter expression" if
+ * the schema is the same as the table schema.
+ */
+ foreach(lc, data->publications)
+ {
+ Publication *pub = lfirst(lc);
+ HeapTuple rftuple = NULL;
+ Datum rfdatum = 0;
+ bool rfisnull;
+ bool pub_no_filter = false;
+ List *schemarelids = NIL;

Not all of these variables need to be declared at the top of the loop
like this. Consider moving some of them (e.g. rfisnull, schemarelids)
lower down to declare only in the scope that uses them.

~~~

7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_row_filter

+ if (pub->pubactions.pubinsert)
+ no_filter[idx_ins] = true;
+ if (pub->pubactions.pubupdate)
+ no_filter[idx_upd] = true;
+ if (pub->pubactions.pubdelete)
+ no_filter[idx_del] = true;

This code can be simplified I think. e.g.

no_filter[idx_ins] |= pub->pubactions.pubinsert;
no_filter[idx_upd] |= pub->pubactions.pubupdate;
no_filter[idx_del] |= pub->pubactions.pubdelete;

~~~

8. src/backend/replication/pgoutput/pgoutput.c - get_rel_sync_entry

@@ -1245,9 +1650,6 @@ get_rel_sync_entry(PGOutputData *data, Oid relid)
entry->pubactions.pubtruncate |= pub->pubactions.pubtruncate;
}

- if (entry->pubactions.pubinsert && entry->pubactions.pubupdate &&
- entry->pubactions.pubdelete && entry->pubactions.pubtruncate)
- break;
}

I was not sure why that code was removed. Is it deliberate/correct?

~~~

9. src/backend/utils/cache/relcache.c - rowfilter_column_walker

@@ -5521,39 +5535,98 @@ RelationGetExclusionInfo(Relation indexRelation,
MemoryContextSwitchTo(oldcxt);
}

+
+
/*
- * Get publication actions for the given relation.
+ * Check if any columns used in the row filter WHERE clause are not part of
+ * REPLICA IDENTITY and save the invalid column number in
+ * rf_context::invalid_rfcolnum.
*/

There is an extra blank line before the function comment.

~~~

10. src/backend/utils/cache/relcache.c - GetRelationPublicationInfo

+ if (HeapTupleIsValid(rftuple))
+ {
+ Datum rfdatum;
+ bool rfisnull;
+ Node *rfnode;
+
+ context.pubviaroot = pubform->pubviaroot;
+ context.parentid = publish_as_relid;
+ context.relid = relid;
+
+ rfdatum = SysCacheGetAttr(PUBLICATIONRELMAP, rftuple,
+ Anum_pg_publication_rel_prqual,
+ &rfisnull);
+
+ if (!rfisnull)
+ {
+ rfnode = stringToNode(TextDatumGetCString(rfdatum));
+ rfcol_valid = !rowfilter_column_walker(rfnode, &context);
+ invalid_rfcolnum = context.invalid_rfcolnum;
+ pfree(rfnode);
+ }
+
+ ReleaseSysCache(rftuple);
+ }

Those 3 assignments to the context.pubviaroot/parentid/relid can be
moved to be inside the if (!rfisnull) block, because IIUC they don't
get used otherwise. Or, maybe better to just leave as-is; I am not
sure what is best. Please consider.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2022-01-12 05:41:42 2022-01 Commitfest
Previous Message Jaime Casanova 2022-01-12 04:51:07 Re: pg_upgrade parallelism