RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Önder Kalacı' <onderkalaci(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-09-28 05:57:41
Message-ID: TYAPR01MB58663E624B35CD2C5BEC0595F5549@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Önder:

Thank you for updating patch!
Your documentation seems OK, and I could not find any other places to be added

Followings are my comments.

====
01 relation.c - general

Many files are newly included.
I was not sure but some codes related with planner may be able to move to src/backend/optimizer/plan.
How do you and any other one think?

02 relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for the apply side. If not,
+ * returns InvalidOid.
+ */
+static Oid
+FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
```

I grepped files, but I cannot find the word "apply side". How about "subscriber" instead?

03 relation.c - FindLogicalRepUsableIndex

```
+ /* Simple case, we already have an identity or pkey */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /*
+ * If index scans are disabled, use a sequential scan.
+ *
+ * Note that we still allowed index scans above when there is a primary
+ * key or unique index replica identity, but that is the legacy behaviour
+ * (even when enable_indexscan is false), so we hesitate to move this
+ * enable_indexscan check to be done earlier in this function.
+ */
+ if (!enable_indexscan)
+ return InvalidOid;
```

a.
I think "identity or pkey" should be "replica identity key or primary key" or "RI or PK"

b.
Later part should be at around GetRelationIdentityOrPK.

04 relation.c - FindUsableIndexForReplicaIdentityFull

```
+ MemoryContext usableIndexContext;
...
+ usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+ "usableIndexContext",
+ ALLOCSET_DEFAULT_SIZES);
```

I grepped other sources, and I found that the name like "tmpcxt" is used for the temporary MemoryContext.

05 relation.c - SuitablePathsForRepIdentFull

```
+ indexRelation = index_open(idxoid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial = (indexInfo->ii_Predicate != NIL);
+ is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
```

Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't same lock be released?

06 relation.c - GetCheapestReplicaIdentityFullPath

IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = $N" is emulated, right?
you can write explicitly it as comment

07 relation.c - GetCheapestReplicaIdentityFullPath

```
+ Path *path = (Path *) lfirst(lc);
+ Oid idxoid = GetIndexOidFromPath(path);
+
+ if (!OidIsValid(idxoid))
+ {
+ /* Unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
```

I was not clear about here. IIUC in the function we want to extract "good" scan plan and based on that the cheapest one is chosen.
GetIndexOidFromPath() seems to return InvalidOid when the input path is not index scan, so why is it appended to the suitable list?

===
08 worker.c - usable_indexoid_internal

I think this is not "internal" function, such name should be used for like "apply_handle_commit" - "apply_handle_commit_internal", or "apply_handle_insert" - "apply_handle_insert_internal".
How about "get_usable_index" or something?

09 worker.c - usable_indexoid_internal

```
+ Oid targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
+ Oid localrelid = relinfo->ri_RelationDesc->rd_id;
+
+ if (targetrelid != localrelid)
```

I think these lines are very confusable.
IIUC targetrelid is corresponded to the "parent", and localrelid is corresponded to the "child", right?
How about changing name to "partitionedoid" and "leadoid" or something?

===
10 032_subscribe_use_index.pl

```
# create tables pub and sub
$node_publisher->safe_psql('postgres',
"CREATE TABLE test_replica_id_full (x int)");
$node_publisher->safe_psql('postgres',
"ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
$node_subscriber->safe_psql('postgres',
"CREATE TABLE test_replica_id_full (x int)");
$node_subscriber->safe_psql('postgres',
"CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
```

In many places same table is defined, altered as "REPLICA IDENTITY FULL", and index is created.
Could you combine them into function?

11 032_subscribe_use_index.pl

```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

In many places this check is done. Could you combine them into function?

12 032_subscribe_use_index.pl

```
# create pub/sub
$node_publisher->safe_psql('postgres',
"CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full");
$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub_rep_full"
);
```

Same as above

13 032_subscribe_use_index.pl

```
# cleanup pub
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_rep_full");
$node_publisher->safe_psql('postgres', "DROP TABLE test_replica_id_full");
# cleanup sub
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_rep_full");
$node_subscriber->safe_psql('postgres', "DROP TABLE test_replica_id_full");
```

Same as above

14 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX

```
# make sure that the subscriber has the correct data
my $result = $node_subscriber->safe_psql('postgres',
"SELECT sum(x) FROM test_replica_id_full");
is($result, qq(212), 'ensure subscriber has the correct data at the end of the test');

$node_subscriber->poll_query_until(
'postgres', q{select sum(x)=212 AND count(*)=21 AND count(DISTINCT x)=20 FROM test_replica_id_full;}
) or die "ensure subscriber has the correct data at the end of the test";
```

I think first one is not needed.

15 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# insert some initial data within the range 0-20
$node_publisher->safe_psql('postgres',
"INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
);
```

I think data is within the range 0-19.
(There are some mistakes)

===
16 test/subscription/meson.build

Your test 't/032_subscribe_use_index.pl' must be added in the 'tests' for meson build system.
(I checked on my env, and your test works well)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-09-28 06:00:34 Re: A doubt about a newly added errdetail
Previous Message Michael Paquier 2022-09-28 05:56:05 Re: Extend win32 error codes to errno mapping in win32error.c