Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-08-24 09:06:13
Message-ID: CAHut+PuuY40mameDJFzw9ABOGGjJwJf5TB5zkKOh1P0N1WKjMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for the patch v8-0001:

======

1. Commit message

1a.
Majority of the logic on the subscriber side has already existed in the code.

SUGGESTION
The majority of the logic on the subscriber side already exists in the code.

~

1b.
Second, when REPLICA IDENTITY IS FULL on the publisher and an index is
used on the subscriber...

SUGGESTION
Second, when REPLICA IDENTITY FULL is on the publisher and an index is
used on the subscriber...

~

1c.
Still, below I try to show case the potential improvements using an
index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up
around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

"show case" -> "showcase"

~

1d.
In above text, what was meant by "catches up around ~5 seconds"?
e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?

~

1e.
// create one indxe, even on a low cardinality column

typo "indxe"

======

2. GENERAL

2a.
There are lots of single-line comments that start lowercase, but by
convention, I think they should start uppercase.

e.g. + /* we should always use at least one attribute for the index scan */
e.g. + /* we might not need this if the index is unique */
e.g. + /* avoid expensive equality check if index is unique */
e.g. + /* unrelated Path, skip */
e.g. + /* simple case, we already have an identity or pkey */
e.g. + /* indexscans are disabled, use seq. scan */
e.g. + /* target is a regular table */

~~

2b.
There are some excess blank lines between the function. By convention,
I think 1 blank line is normal, but here there are sometimes 2.

~~

2c.
There are some new function comments which include their function name
in the comment. It seemed unnecessary.

e.g. GetCheapestReplicaIdentityFullPath
e.g. FindUsableIndexForReplicaIdentityFull
e.g. LogicalRepUsableIndex

======

3. src/backend/executor/execReplication.c - build_replindex_scan_key

- int attoff;
+ int index_attoff;
+ int scankey_attoff;
bool isnull;
Datum indclassDatum;
oidvector *opclass;
int2vector *indkey = &idxrel->rd_index->indkey;
- bool hasnulls = false;
-
- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
- RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
Anum_pg_index_indclass, &isnull);
Assert(!isnull);
opclass = (oidvector *) DatumGetPointer(indclassDatum);
+ scankey_attoff = 0;

Maybe just assign scankey_attoff = 0 at the declaration?

~~~

4.

+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we deal is not these.
+ */

"we deal" -> "we are dealing with" ?

~~~

5.

+ /*
+ * For a non-primary/unique index with an additional expression, do
+ * not have to continue at this point. However, the below code
+ * assumes the index scan is only done for simple column references.
+ */
+ continue;

Is this one of those comments that ought to have a "XXX" prefix as a
note for the future?

~~~

6.

- int pkattno = attoff + 1;
...
/* Initialize the scankey. */
- ScanKeyInit(&skey[attoff],
- pkattno,
+ ScanKeyInit(&skey[scankey_attoff],
+ index_attoff + 1,
BTEqualStrategyNumber,
Wondering if it would have been simpler if you just did:
int pkattno = index_attoff + 1;

~~~

7.

- skey[attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;

SUGGESTION
skey[scankey_attoff].sk_flags |= (SK_ISNULL | SK_SEARCHNULL)

~~~

8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

@@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
TransactionId xwait;
Relation idxrel;
bool found;
+ TypeCacheEntry **eq;
+ bool indisunique;
+ int scankey_attoff;

/* Open the index. */
idxrel = index_open(idxoid, RowExclusiveLock);
+ indisunique = idxrel->rd_index->indisunique;
+
+ /* we might not need this if the index is unique */
+ eq = NULL;

Maybe just default assign eq = NULL in the declaration?

~~~

9.

+ scan = index_beginscan(rel, idxrel, &snap,
+ scankey_attoff, 0);

Unnecessary wrapping?

~~~

10.

+ /* we only need to allocate once */
+ if (eq == NULL)
+ eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);

But shouldn't you also free this 'eq' before the function returns, to
prevent leaking memory?

======

11. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent operation, it is performed
+ * only when first time an operation is performed on the relation or
+ * after invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */

SUGGESTION (minor rewording)
Finding a usable index is an infrequent task. It is performed only
when an operation is first performed on the relation, or after
invalidation of the relation cache entry (e.g., such as ANALYZE).

~~~

12. src/backend/replication/logical/relation.c - logicalrep_partition_open

Same as comment #11 above.

~~~

13. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+static
+Oid
+GetIndexOidFromPath(Path *path)

Typically I think 'static Oid' should be on one line.

~~~

14.

+ switch (path->pathtype)
+ {
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ {
+ IndexPath *index_sc = (IndexPath *) path;
+ indexOid = index_sc->indexinfo->indexoid;
+
+ break;
+ }
+
+ default:
+ indexOid = InvalidOid;
+ }

Is there any point in using a switch statement when there is only one
functional code block?

Why not just do:

if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
{
...
}

return InvalidOid;

~~~

15. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+ * Returns true if the given index consist only of expressions such as:
+ * CREATE INDEX idx ON table(foo(col));

"consist" -> "consists"

~~~

16.

+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i=0;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)

Don't initialise 'i' twice.

~~~

17.

+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+
+ }

Spurious blank line

~~~

18. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * Iterates over the input path list, and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */

"path list, and" -> "path list and"

~~~

19.

+ if (!OidIsValid(indexOid))
+ {
+ /* unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ continue;
+ }
+
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree_index = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial_index = (indexInfo->ii_Predicate != NIL);
+ is_index_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (!is_btree_index || is_partial_index || is_index_only_on_expression)
+ continue;

Maybe better to change this logic using if/else and changing the last
condition so them you can avoid having any of those 'continue' in this
loop.

~~~

20. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * GetCheapestReplicaIdentityFullPath generates all the possible paths
+ * for the given subscriber relation, assuming that the source relation
+ * is replicated via REPLICA IDENTITY FULL.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL gurantees
+ * that.
+ */

20a.
typo "gurantees"

~

20b.
The function comment neglects to say that after getting all these
paths the final function return is the cheapest one that it found.

~~~

21.

+ for (attno = 0; attno < RelationGetNumberOfAttributes(localrel); attno++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(localrel->rd_att, attno);
+
+ if (attr->attisdropped)
+ {
+ continue;
+ }
+ else
+ {
+ Expr *eq_op;

Maybe simplify by just removing the 'else' or instead just reverse the
condition of the 'if'.

~~~

22.

+ /*
+ * A sequential scan has could have been dominated by
+ * by an index scan during make_one_rel(). We should always
+ * have a sequential scan before set_cheapest().
+ */

"has could have been" -> "could have been"

~~~

23. src/backend/replication/logical/relation.c - LogicalRepUsableIndex

+static Oid
+LogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
+{
+ Oid idxoid;
+
+ /*
+ * We never need index oid for partitioned tables, always rely on leaf
+ * partition's index.
+ */
+ if (localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return InvalidOid;
+
+ /* simple case, we already have an identity or pkey */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /* indexscans are disabled, use seq. scan */
+ if (!enable_indexscan)
+ return InvalidOid;

I thought the (!enable_indexscan) fast exit perhaps should be done
first, or at least before calling GetRelationIdentityOrPK.

======

24. src/backend/replication/logical/worker.c - apply_handle_delete_internal

@@ -2034,12 +2021,14 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
EPQState epqstate;
TupleTableSlot *localslot;
bool found;
+ Oid usableIndexOid = usable_indexoid_internal(edata, relinfo);

EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
ExecOpenIndices(relinfo, false);

- found = FindReplTupleInLocalRel(estate, localrel, remoterel,
- remoteslot, &localslot);
+
+ found = FindReplTupleInLocalRel(estate, localrel, usableIndexOid,
+ remoterel, remoteslot, &localslot);

24a.
Excess blank line above FindReplTupleInLocalRel call.

~

24b.
This code is almost same in function handle_update_internal(), except
the wrapping of the params is different. Better to keep everything
consistent looking.

~~~

25. src/backend/replication/logical/worker.c - usable_indexoid_internal

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)

I'm not sure is this can maybe return InvalidOid? The function comment
should clarify it.

~~~

26.

I might be mistaken, but somehow I feel this function can be
simplified. e.g. If you have a var 'relmapentry' and let the normal
table use the initial value of that. Then I think you only need to
test for the partitioned table and reassign that var as appropriate.
It also removes the need for having 'usableIndexOid' var.

FOR EXAMPLE,

static Oid
usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
{
ResultRelInfo *targetResultRelInfo = edata->targetRelInfo;
LogicalRepRelMapEntry *relmapentry = edata->targetRel;
Oid targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
Oid localrelid = relinfo->ri_RelationDesc->rd_id;

if (targetrelid != localrelid)
{
/*
* Target is a partitioned table, so find relmapentry of the partition.
*/
TupleConversionMap *map = relinfo->ri_RootToPartitionMap;
AttrMap *attrmap = map ? map->attrMap : NULL;
LogicalRepPartMapEntry *part_entry =
logicalrep_partition_open(relmapentry, relinfo->ri_RelationDesc,
attrmap);

Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
RELKIND_PARTITIONED_TABLE);

relmapentry = part_entry->relmapentry;
}
return relmapentry->usableIndexOid;
}

~~~

27.

+ /*
+ * Target is a partitioned table, get the index oid the partition.
+ */

SUGGESTION
Target is a partitioned table, so get the index oid of the partition.

or (see the example of comment @26)

~~~

28. src/backend/replication/logical/worker.c - FindReplTupleInLocalRel

@@ -2093,12 +2125,11 @@ FindReplTupleInLocalRel(EState *estate,
Relation localrel,

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

I think this might have been existing functionality...

The comment says "* Local tuple, if found, is returned in
'*localslot'." But the code is unconditionally doing
table_slot_create() before it even knows if a tuple was found or not.
So what about when it is NOT found - in that case shouldn't there be
some cleaning up that (unused?) table slot that got unconditionally
created?

~~~

29. src/backend/replication/logical/worker.c - apply_handle_tuple_routing

@@ -2202,13 +2233,17 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
* suitable partition.
*/
{
+ LogicalRepRelMapEntry *entry;
TupleTableSlot *localslot;
ResultRelInfo *partrelinfo_new;
bool found;

+ entry = &part_entry->relmapentry;

Maybe just do this assignment at the entry declaration time?

~~~

30.

/* Get the matching local tuple from the partition. */
found = FindReplTupleInLocalRel(estate, partrel,
- &part_entry->remoterel,
+ part_entry->relmapentry.usableIndexOid,
+ &entry->remoterel,
remoteslot_part, &localslot);
Why not use the new 'entry' var just assigned instead of repeating
part_entry->relmapentry?

SUGGESTION
found = FindReplTupleInLocalRel(estate, partrel,
entry->usableIndexOid,
&entry->remoterel,
remoteslot_part, &localslot);

~~~

31.

+ slot_modify_data(remoteslot_part, localslot, entry,
newtup);

Unnecessary wrapping.

======

32. src/include/replication/logicalrelation.h

+typedef struct LogicalRepPartMapEntry
+{
+ Oid partoid; /* LogicalRepPartMap's key */
+ LogicalRepRelMapEntry relmapentry;
+} LogicalRepPartMapEntry;

IIUC this struct has been moved from relation.c to here. But I think
there was a large comment about this struct which maybe needs to be
moved with it (see the original relation.c).

/*
* Partition map (LogicalRepPartMap)
*
* When a partitioned table is used as replication target, replicated
* operations are actually performed on its leaf partitions, which requires
* the partitions to also be mapped to the remote relation. Parent's entry
* (LogicalRepRelMapEntry) cannot be used as-is for all partitions, because
* individual partitions may have different attribute numbers, which means
* attribute mappings to remote relation's attributes must be maintained
* separately for each partition.
*/

======

33. .../subscription/t/032_subscribe_use_index.pl

Typo "MULTIPILE"

This typo occurs several times...

e.g. # Testcase start: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS
e.g. # Testcase end: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS
e.g. # Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPILE COLUMNS
e.g. # Testcase end: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS

~~~

34.

# Basic test where the subscriber uses index
# and only touches multiple rows

What does "only ... multiple" mean?

This occurs several times also.

~~~

35.

+# wait for initial table synchronization to finish
+$node_subscriber->wait_for_subscription_sync;
+$node_subscriber->wait_for_subscription_sync;
+$node_subscriber->wait_for_subscription_sync;

That triple wait looks unusual. Is it deliberate?

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2022-08-24 09:39:06 Re: [RFC] building postgres with meson - v11
Previous Message Julien Rouhaud 2022-08-24 08:59:23 Re: ICU for global collation