RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Önder Kalacı' <onderkalaci(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-09-15 12:56:26
Message-ID: TYAPR01MB5866DAF75EDCB95E81145395F5499@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Önder,

Thank you for proposing good feature. I'm also interested in the patch,
So I started to review this. Followings are initial comments.

===
For execRelation.c

01. RelationFindReplTupleByIndex()

```
/* Start an index scan. */
InitDirtySnapshot(snap);
- scan = index_beginscan(rel, idxrel, &snap,
- IndexRelationGetNumberOfKeyAttributes(idxrel),
- 0);

/* Build scan key. */
- build_replindex_scan_key(skey, rel, idxrel, searchslot);
+ scankey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);

+ scan = index_beginscan(rel, idxrel, &snap, scankey_attoff, 0);
```

I think "/* Start an index scan. */" should be just above index_beginscan().

===
For worker.c

02. sable_indexoid_internal()

```
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
```

"InvalidOid usableIndexOid" should be "invalid usableIndexOid,"

03. check_relation_updatable()

```
* We are in error mode so it's fine this is somewhat slow. It's better to
* give user correct error.
*/
- if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+ if (OidIsValid(rel->usableIndexOid))
{
```

Shouldn't we change the above comment to? The check is no longer slow.

===
For relation.c

04. GetCheapestReplicaIdentityFullPath()

```
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)
+{
+ PlannerInfo *root;
+ Query *query;
+ PlannerGlobal *glob;
+ RangeTblEntry *rte;
+ RelOptInfo *rel;
+ int attno;
+ RangeTblRef *rt;
+ List *joinList;
+ Path *seqScanPath;
```

I think the part that constructs dummy-planner state can be move to another function
because that part is not meaningful for this.
Especially line 824-846 can.

===
For 032_subscribe_use_index.pl

05. general

```
+# insert some initial data within the range 0-1000
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
+);
```

It seems that the range of initial data seems [0, 19].
Same mistake-candidates are found many place.

06. general

```
+# updates 1000 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
```

Only 50 tuples are modified here.
Same mistake-candidates are found many place.

07. general

```
+# we check if the index is used or not
+$node_subscriber->poll_query_until(
+ 'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full_3 updates 200 rows via index";
```
The query will be executed until the index scan is finished, but it may be not commented.
How about changing it to "we wait until the index used on the subscriber-side." or something?
Same comments are found in many place.

08. test related with ANALYZE

```
+# Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - PARTITIONED TABLE
+# ====================================================================
```

"Testcase start:" should be "Testcase end:" here.

09. general

In some tests results are confirmed but in other test they are not.
I think you can make sure results are expected in any case if there are no particular reasons.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message houzj.fnst@fujitsu.com 2022-09-15 12:57:07 RE: why can't a table be part of the same publication as its schema
Previous Message Alvaro Herrera 2022-09-15 12:50:29 Re: Avoid use deprecated Windows Memory API