Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, shiy(dot)fnst(at)fujitsu(dot)com, wangw(dot)fnst(at)fujitsu(dot)com, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-02-15 02:16:31
Message-ID: CAHut+PvDhmnqz0mH8d4x_K_nYZ50b_fxh=R1C0a+1hNC5vJ8gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for patch v23.

======
General

1.
IIUC the previous logic for checking "cost" comparisons and selecting
the "cheapest" strategy is no longer present in the latest patch.

In that case, I think there are some leftover stale comments that need
changing. For example,

1a. Commit message:
"let the planner sub-modules compare the costs of index versus
sequential scan and choose the cheapest."

~

1b. Commit message:
"Finally, pick the cheapest `Path` among."

~

1c. FindLogicalRepUsableIndex function:
+ * We are looking for one more opportunity for using an index. If
+ * there are any indexes defined on the local relation, try to pick
+ * the cheapest index.

======
doc/src/sgml/logical-replication.sgml

If replica identity "full" is used, indexes can be used on the
subscriber side for seaching the rows. The index should be btree,
non-partial and have at least one column reference (e.g., should not
consists of only expressions). If there are no suitable indexes, the
search on the subscriber side is very inefficient and should only be
used as a fallback if no other solution is possible

2a.
Fixed typo "seaching", and minor rewording.

SUGGESTION
When replica identity "full" is specified, indexes can be used on the
subscriber side for searching the rows. These indexes should be btree,
non-partial and have at least one column reference (e.g., should not
consist of only expressions). If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity "full" should only be used as a fallback if no other
solution is possible.

~

2b.
I know you are just following some existing text here, but IMO this
should probably refer to replica identity <literal>FULL</literal>
instead of "full".

======
src/backend/executor/execReplication.c

3. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if
+ * the index is relation's primary key's index or
+ * relaton's replica identity index.
+ *
+ * Returns false otherwise.
+ */
+static bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ if (RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid)
+ return true;
+
+ return false;

3a.
Comment typo "relaton"

~

3b.
Code could be written using single like below if you wish (but see #2c)

return RelationGetReplicaIndex(rel) == idxoid ||
RelationGetPrimaryKeyIndex(rel) == idxoid;

~

3c.
Actually, RelationGetReplicaIndex and RelationGetPrimaryKeyIndex
implementations are very similar so it seemed inefficient to be
calling both of them. IMO it might be better to just make a new
relcache function IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid).
This implementation will be similar to those others, but now you need
only to call the workhorse RelationGetIndexList *one* time.

~~~

4. RelationFindReplTupleByIndex

bool found;
+ TypeCacheEntry **eq = NULL; /* only used when the index is not repl. ident
+ * or pkey */
+ bool idxIsRelationIdentityOrPK;

If you change the comment to say "RI" instead of "repl. Ident" then it
can all fit on one line, which would be an improvement.

======
src/backend/replication/logical/relation.c

5.
#include "replication/logicalrelation.h"
#include "replication/worker_internal.h"
+#include "optimizer/cost.h"
#include "utils/inval.h"

Can that #include be added in alphabetical order like the others or not?

~~~

6. logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * of the relation cache entry (such as ANALYZE or CREATE/DROP index on
+ * the relation).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+

Typo "of of the relation"

~~~

7. FindUsableIndexForReplicaIdentityFull

+static Oid
+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ MemoryContext usableIndexContext;
+ MemoryContext oldctx;
+ Oid usableIndex;
+ Oid idxoid;
+ List *indexlist;
+ ListCell *lc;
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

It looks like some of these variables are only used within the scope
of the foreach loop, so I think that is where they should be declared.

~~~

8.
+ usableIndex = InvalidOid;

Might as well do that assignment at the declaration.

~~~

9. FindLogicalRepUsableIndex

+ /*
+ * Simple case, we already have a primary key or a replica identity index.
+ *
+ * Note that we do not use index scans below when enable_indexscan is
+ * false. Allowing primary key or replica identity even when index scan is
+ * disabled is the legacy behaviour. So we hesitate to move the below
+ * enable_indexscan check to be done earlier in this function.
+ */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /* If index scans are disabled, use a sequential scan */
+ if (!enable_indexscan)
+ return InvalidOid;

~

IMO that "Note" really belongs with the if (!enable)indexscan) more like this:

SUGGESTION
/*
* Simple case, we already have a primary key or a replica identity index.
*/
idxoid = GetRelationIdentityOrPK(localrel);
if (OidIsValid(idxoid))
return idxoid;

/*
* If index scans are disabled, use a sequential scan.
*
* Note we hesitate to move this check to earlier in this function
* because allowing primary key or replica identity even when index scan
* is disabled is the legacy behaviour.
*/
if (!enable_indexscan)
return InvalidOid;

======
src/backend/replication/logical/worker.c

10. get_usable_indexoid

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ *
+ * Note that if the corresponding relmapentry has invalid usableIndexOid,
+ * the function returns InvalidOid.
+ */

"(e.g., the relation)" --> "(i.e. the relation)"

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-02-15 02:33:57 Re: Perform streaming logical transactions by background workers and parallel apply
Previous Message Andres Freund 2023-02-15 02:00:28 Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?