Quick Links

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From:	Önder Kalacı <onderkalaci(at)gmail(dot)com>
To:	"kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Subject:	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date:	2022-10-07 11:54:00
Message-ID:	CACawEhWUtLbhMF-AJyGOyw3L7Vs53gSBL-3kheG=TNS0PfkidA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Kurado Hayato,

> In your patch:
>
> ```
> + /* Simple case, we already have a primary key or a replica
> identity index */
> + idxoid = GetRelationIdentityOrPK(localrel);
> + if (OidIsValid(idxoid))
> + return idxoid;
> +
> + /*
> + * If index scans are disabled, use a sequential scan.
> + *
> + * Note that we still allowed index scans above when there is a
> primary
> + * key or unique index replica identity, but that is the legacy
> behaviour
> + * (even when enable_indexscan is false), so we hesitate to move
> this
> + * enable_indexscan check to be done earlier in this function.
> + */
> ```
>
> And the paragraph " Note that we..." should be at above of
> GetRelationIdentityOrPK().
> Future readers will read the function from top to bottom,
> and when they read around GetRelationIdentityOrPK() they may be confused.
>
>
Ah, makes sense, now I applied your feedback (with some different wording).

> The inlined comment in the function has a similar comment. Is that clear
> > enough?
> > /* * Generate restrictions for all columns in the form of col_1 = $1 AND
> *
> > col_2 = $2 ... */
>
> Actually I missed it, but I still think that whole of emulated SQL should
> be clarified.

Alright, it makes sense. I added the emulated SQL to the function comment
of GetCheapestReplicaIdentityFullPath.

> And followings are the comments for v14. They are mainly about comments.
>
> ===
> 01. relation.c - logicalrep_rel_open
>
> ```
> + /*
> + * Finding a usable index is an infrequent task. It occurs
> when an
> + * operation is first performed on the relation, or after
> invalidation
> + * of the relation cache entry (e.g., such as ANALYZE).
> + */
> + entry->usableIndexOid =
> FindLogicalRepUsableIndex(entry->localrel, remoterel);
> ```
>
> I thought you can mention CREATE INDEX in the comment.
>
> According to your analysis [1] the relation cache will be invalidated if
> users do CREATE INDEX
> At that time the hash entry will be removed
> (logicalrep_relmap_invalidate_cb) and "usable" index
> will be checked again.
>

Yes, that is right. I think it makes sense to mention that as well. In
fact, I also decided to add such a test.

I realized that all tests use ANALYZE for re-calculation of the index. Now,
I added an explicit test that uses CREATE/DROP index to re-calculate the
index.

see # Testcase start: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP
INDEX.

> ~~~
> 02. relation.c - logicalrep_partition_open
>
> ```
> + /*
> + * Finding a usable index is an infrequent task. It occurs when an
> + * operation is first performed on the relation, or after
> invalidation of
> + * the relation cache entry (e.g., such as ANALYZE).
> + */
> + entry->usableIndexOid = FindLogicalRepUsableIndex(partrel,
> remoterel);
> +
> ```
>
> Same as above
>
>
done

> ~~~
> 03. relation.c - GetIndexOidFromPath
>
> ```
> + if (path->pathtype == T_IndexScan || path->pathtype ==
> T_IndexOnlyScan)
> + {
> + IndexPath *index_sc = (IndexPath *) path;
> +
> + return index_sc->indexinfo->indexoid;
> + }
> ```
>
> I thought Assert(OidIsValid(indexoid)) may be added here. Or is it quite
> trivial?
>

Looking at the PG code, I couldn't see any place that asserts the
information. That seems like fundamental information that is never invalid.

Btw, even if it returns InvalidOid for some reason, we'd not be crashing.
Only not able to use any indexes, fall back to seq. scan.

>
> ~~~
> 04. relation.c - IndexOnlyOnExpression
>
> This method just returns "yes" or "no", so the name of method should be
> start "Has" or "Is".
>
> Yes, it seems like that is a common convention.

> ~~~
> 05. relation.c - SuitablePathsForRepIdentFull
>
> ```
> +/*
> + * Iterates over the input path list and returns another
> + * path list that includes index [only] scans where paths
> + * with non-btree indexes, partial indexes or
> + * indexes on only expressions have been removed.
> + */
> ```
>
> These lines seems to be around 60 columns. Could you expand around 80?
>

done

>
> ~~~
> 06. relation.c - SuitablePathsForRepIdentFull
>
> ```
> + Relation indexRelation;
> + IndexInfo *indexInfo;
> + bool is_btree;
> + bool is_partial;
> + bool is_only_on_expression;
> +
> + indexRelation = index_open(idxoid,
> AccessShareLock);
> + indexInfo = BuildIndexInfo(indexRelation);
> + is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> + is_partial = (indexInfo->ii_Predicate != NIL);
> + is_only_on_expression =
> IndexOnlyOnExpression(indexInfo);
> + index_close(indexRelation, AccessShareLock);
> +
> + if (is_btree && !is_partial &&
> !is_only_on_expression)
> + suitableIndexList =
> lappend(suitableIndexList, path);
> ```
>
> Please add a comment like "eliminating not suitable path" or something.
>

done

>
> ~~~
> 07. relation.c - GenerateDummySelectPlannerInfoForRelation
>
> ```
> +/*
> + * This is not a generic function. It is a helper function
> + * for GetCheapestReplicaIdentityFullPath. The function
> + * creates a dummy PlannerInfo for the given relationId
> + * as if the relation is queried with SELECT command.
> + */
> ```
>
> These lines seems to be around 60 columns. Could you expand around 80?
>

done

>
> ~~~
> 08. relation.c - FindLogicalRepUsableIndex
>
> ```
> +/*
> + * Returns an index oid if we can use an index for subscriber . If not,
> + * returns InvalidOid.
> + */
> ```
>
> "subscriber ." should be "subscriber.", blank is not needed.
>

fixed

>
> ~~~
> 09. worker.c - get_usable_indexoid
>
> ```
> +
> Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
> + RELKIND_PARTITIONED_TABLE);
> ```
>
> I thought this assertion seems to be not needed, because this is
> completely same as the condition of if-statement.
>

Yes, the refactor we made in the previous iteration made this assertion
obsolete as you noted.

Attached v15, thanks for the reviews.

Thanks,
Onder KALACI

Attachment	Content-Type	Size
v15_0001_use_index_on_subs_when_pub_rep_ident_full.patch	application/octet-stream	84.3 KB

In response to

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-10-06 01:09:47 from kuroda.hayato@fujitsu.com

Responses

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher at 2022-10-11 03:54:01 from kuroda.hayato@fujitsu.com

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2022-10-07 11:59:08	Re: use has_privs_of_role() for pg_hba.conf
Previous Message	Maxim Orlov	2022-10-07 11:04:09	Re: Add 64-bit XIDs into PostgreSQL 15