Quick Links

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From:	Önder Kalacı <onderkalaci(at)gmail(dot)com>
To:	Peter Smith <smithpb2250(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date:	2022-09-19 15:32:03
Message-ID:	CACawEhXnTcXBOTofptkgSBOyD81Pohd7MSfFaW0SKo-0oKrCJg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Peter,

Thanks again for the review, see my comments below:

>
> ======
>
> 1. Commit message
>
> It is often not feasible to use `REPLICA IDENTITY FULL` on the publication
> because it leads to full table scan per tuple change on the subscription.
> This makes `REPLICA IDENTITY FULL` impracticable -- probably other than
> some small number of use cases.
>
> ~
>
> The "often not feasible" part seems repeated by the "impracticable" part.
>

> SUGGESTION
> Using `REPLICA IDENTITY FULL` on the publication leads to a full table
> scan per tuple change on the subscription. This makes `REPLICA
> IDENTITY FULL` impracticable -- probably other than some small number
> of use cases.
>
> ~~~
>

Sure, this is easier to follow, updated.

>
> 2.
>
> The Majority of the logic on the subscriber side already exists in
> the code.
>
> "Majority" -> "majority"
>
>
fixed

> ~~~
>
> 3.
>
> The ones familiar
> with this part of the code could realize that the sequential scan
> code on the subscriber already implements the `tuples_equal()`
> function.
>
> SUGGESTION
> Anyone familiar with this part of the code might recognize that...
>
> ~~~
>

Yes, this is better, applied

>
> 4.
>
> In short, the changes on the subscriber is mostly
> combining parts of (unique) index scan and sequential scan codes.
>
> "is mostly" -> "are mostly"
>
> ~~~
>
>
applied

> 5.
>
> From the performance point of view, there are few things to note.
>
> "are few" -> "are a few"
>
>
applied

> ======
>
> 6. src/backend/executor/execReplication.c - build_replindex_scan_key
>
> +static int
> build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
> TupleTableSlot *searchslot)
> {
> - int attoff;
> + int index_attoff;
> + int scankey_attoff = 0;
>
> Should it be called 'skey_attoff' for consistency with the param 'skey'?
>
>
That looks better, updated

> ~~~
>
> 7.
>
> Oid operator;
> Oid opfamily;
> RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>
> Maybe the 'optype' should be adjacent to the other Oid opXXX
> declarations just to keep them all together?
>

I do not have any preference on this. Although I do not see such a strong
pattern in the code, I have no objection to doing so changed.

~~~
>
> 8.
>
> + if (!AttributeNumberIsValid(table_attno))
> + {
> + IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
> +
> + /*
> + * There are two cases to consider. First, if the index is a primary or
> + * unique key, we cannot have any indexes with expressions. So, at this
> + * point we are sure that the index we are dealing with is not these.
> + */
> + Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
> + RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
> +
> + /*
> + * At this point, we are also sure that the index is not consisting
> + * of only expressions.
> + */
> +#ifdef USE_ASSERT_CHECKING
> + indexInfo = BuildIndexInfo(idxrel);
> + Assert(!IndexOnlyOnExpression(indexInfo));
> +#endif
>
> I was a bit confused by the comment. IIUC the code has already called
> the FilterOutNotSuitablePathsForReplIdentFull some point prior so all
> the unwanted indexes are already filtered out. Therefore these
> assertions are just for no reason, other than sanity checking that
> fact, right? If my understand is correct perhaps a simpler single
> comment is possible:
>

Yes, these are for sanity check

>
> SUGGESTION (or something like this)
> This attribute is an expression, however
> FilterOutNotSuitablePathsForReplIdentFull was called earlier during
> [...] and the indexes comprising only expressions have already been
> eliminated. We sanity check this now. Furthermore, because primary key
> and unique key indexes can't include expressions we also sanity check
> the index is neither of those kinds.
>
> ~~~
>

I agree that we can improve comments here. I incorporated your suggestion
as well.

>
> 9.
> - return hasnulls;
> + /* We should always use at least one attribute for the index scan */
> + Assert (scankey_attoff > 0);
>
> SUGGESTION
> There should always be at least one attribute for the index scan.
>

applied

>
> ~~~
>
> 10. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>
> ScanKeyData skey[INDEX_MAX_KEYS];
> IndexScanDesc scan;
> SnapshotData snap;
> TransactionId xwait;
> Relation idxrel;
> bool found;
> TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
> bool indisunique;
> int scankey_attoff;
>
> 10a.
> Should 'scankey_attoff' be called 'skey_attoff' for consistency with
> the 'skey' array?
>

Yes, it makes sense as you suggested on build_replindex_scan_key

>
> ~
>
> 10b.
> Also, it might be tidier to declare the 'skey_attoff' adjacent to the
> 'skey'.
>

moved

>
> ======
>
> 11. src/backend/replication/logical/relation.c
>
> For LogicalRepPartMap, I was wondering if it should keep a small
> comment to xref back to the long comment which was moved to
> logicalreplication.h
>
> e.g.
> /* Refer to the LogicalRepPartMapEntry comment in logicalrelation.h */
>

Could work, added. We already have the xref the other way around
(LogicalRepPartMapEntry->LogicalRepPartMap)

> ~~~
>
> 12. src/backend/replication/logical/relation.c - logicalrep_partition_open
>
> + /*
> + * Finding a usable index is an infrequent task. It occurs when
> + * an operation is first performed on the relation, or after
> + * invalidation of the relation cache entry (e.g., such as ANALYZE).
> + */
> + entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel,
> remoterel);
> entry->localrelvalid = true;
>
> Should there be a blank line between those assignments? (just for
> consistency with the other code of this patch in a later function that
> does exactly the same assignments).
>

done

>
> ~~~
>
> 13. src/backend/replication/logical/relation.c -
> FilterOutNotSuitablePathsForReplIdentFull
>
> Not sure about this function name. Maybe should be something like
> 'FilterOutUnsuitablePathsForReplIdentFull', or just
> 'SuitablePathsForReplIdentFull'
>
> ~~~
>

I think I'll go with a slight modification of your
suggestion: SuitablePathsForRepIdentFull

>
> 14.
>
> + else
> + {
> + Relation indexRelation;
> + IndexInfo *indexInfo;
> + bool is_btree;
> + bool is_partial;
> + bool is_only_on_expression;
>
> Is that another var that could be renamed 'idxoid' like all the others?
>
> seems so, updated

> ~~~
>
> 15. src/backend/replication/logical/relation.c -
> GetCheapestReplicaIdentityFullPath
>
> + typentry = lookup_type_cache(attr->atttypid,
> + TYPECACHE_EQ_OPR_FINFO);
>
> Seems unnecessary wrapping.
>
> fixed

> ~~~
>
> 15.
>
> + /*
> + * Currently it is not possible for planner to pick a
> + * partial index or indexes only on expressions. We
> + * still want to be explicit and eliminate such
> + * paths proactively.
> ...
> ...
> + */
>
> This large comment seems unusually skinny. Needs pg_indent.
>
>
Ok, it has been a while that I have not run pg_indent. Now did and this
comment is fixed as well

> ~~~
>
> 16. src/backend/replication/logical/worker.c - check_relation_updatable
>
> @@ -1753,7 +1738,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
> * We are in error mode so it's fine this is somewhat slow. It's better to
> * give user correct error.
> */
> - if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
> + if (OidIsValid(rel->usableIndexOid))
>
> The original comment about it being "somewhat slow" does not seem
> relevant anymore because it is no longer calling a function in this
> condition.
>
>
Fixed (also a similar comment raised in another review)

> ~~~
>
> 17. src/backend/replication/logical/worker.c - usable_indexoid_internal
>
> + relmapentry = &(part_entry->relmapentry);
>
> The parentheses seem overkill, and code is not written like this
> elsewhere in the same patch.
>

true, no need, removed the parentheses

> ~~~
>
> 18. src/backend/replication/logical/worker.c - apply_handle_tuple_routing
>
> @@ -2202,13 +2225,15 @@ apply_handle_tuple_routing(ApplyExecutionData
> *edata,
> * suitable partition.
> */
> {
> + LogicalRepRelMapEntry *entry = &part_entry->relmapentry;
>
> I think elsewhere in the patch the same variable is called
> 'relmapentry' (which seems a bit better than just 'entry')
>
>
true, it used as relmapentry in other place(s), and in this context entry
is confusing. So, changed to relmapentry.

> ======
>
> 19. .../subscription/t/032_subscribe_use_index.pl
>
> +# ANALYZING child will change the index used on child_1 and going to
> use index_on_child_1_b
> +$node_subscriber->safe_psql('postgres', "ANALYZE child_1");
>
> 19a.
> "ANALYZING child" ? Should that be worded differently? There is
> nothing named 'child' that I could see.
>
>
Do you mean it should be "child_1"? Tha is the name of the table. I
updated the comment, let me know if it is still confusing.

~
>
> 19b.
> "and going to use" ? wording ? "which will be used for " ??
>
>
Rewording the comment below, is that better?

# ANALYZING child_1 will change the index used on the table and
# UPDATE/DELETEs on the subscriber are going to use index_on_child_1_b

I also attached v_11 of the patch.

Thanks,
Onder Kalaci

Attachment	Content-Type	Size
v11_0001_use_index_on_subs_when_pub_rep_ident_full.patch	application/octet-stream	78.5 KB

In response to

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-09-16 00:27:30 from Peter Smith

Responses

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-09-20 02:22:40 from Peter Smith
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-09-20 08:25:33 from Peter Smith

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jaime Casanova	2022-09-19 16:04:03	Re: START_REPLICATION SLOT causing a crash in an assert build
Previous Message	Önder Kalacı	2022-09-19 15:31:55	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher