Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-03 05:10:44
Message-ID: CAHut+PtS5XpCM_wUXwJHLBOBVgZ8E=T8JtYsXHUQCzWASUyY2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v27-0001 (not the tests)

======
Commit Message

1.
There is no smart mechanism to pick the index. Instead, we choose
the first index that fulfils the requirements mentioned above.

~

1a.
I think this paragraph should immediately follow the earlier one
("With this patch...") which talked about the index requirements.

~

1b.
Slight rewording

SUGGESTION
If there is more than one index that satisfies these requirements, we
just pick the first one.

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. When replica identity FULL is
specified, indexes can be used on the subscriber side for searching
the rows. These indexes should be btree, non-partial and have at least
one column reference (e.g., should not consist of only expressions).
These restrictions on the non-unique index properties are in essence
the same restrictions that are enforced for primary keys. Internally,
we follow the same approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber s ide can be very inefficient, therefore replica
identity FULL should only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~

2a.
IMO the <quote>replica identity</quote> in the first sentence should
be changed to be <firstterm>replica identity</firstterm>

~

2b.
Typo: "subscriber s ide" --> "subscriber side"

~

2c.
There is still one remaining "full" in this text. I think ought to be
changed to <literal>FULL</literal> to match the others.

======
src/backend/executor/execReplication.c

3. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's primary key's index or relation's
+ * replica identity index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;
}

~

Since the function name mentions RI (1st) and then PK (2nd), and since
the implementation also has the same order, I think the function
comment should use the same consistent order when describing what it
does.

======
src/backend/replication/logical/relation.c

4. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns an index oid if there is an index that can be used
+ * via the apply worker. The index should be btree, non-partial
+ * and have at least one column reference (e.g., should
+ * not consist of only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.
+ *
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes. We should mostly relax the limitations
+ * in RelationFindReplTupleByIndex().
+ *
+ * If no suitable index is found, returns InvalidOid.
+ *
+ * Note that this is not a generic function, it expects REPLICA
+ * IDENTITY FULL for the remote relation.
+ */

~

4a.
Minor rewording of 1st sentence.

BEFORE
Returns an index oid if there is an index that can be used via the apply worker.

SUGGESTION
Returns the oid of an index that can be used via the apply worker.

~

4b.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes. We should mostly relax the limitations
+ * in RelationFindReplTupleByIndex().

I think this paragraph should come later in the comment (just before
the Note) and should also have "XXX" prefix to indicate it is some
implementation note for future versions.

~~~

5. GetRelationIdentityOrPK

+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */
+Oid
+GetRelationIdentityOrPK(Relation rel)
+{
+ Oid idxoid;
+
+ idxoid = RelationGetReplicaIndex(rel);
+
+ if (!OidIsValid(idxoid))
+ idxoid = RelationGetPrimaryKeyIndex(rel);
+
+ return idxoid;
+}

This is really very similar code to the other new function called
IdxIsRelationIdentityOrPK. I wondered if such similar functions could
be defined together.

~~~

6. FindLogicalRepUsableIndex

+/*
+ * Returns an index oid if we can use an index for subscriber. If not,
+ * returns InvalidOid.
+ */

SUGGESTION
Returns the oid of an index that can be used by a subscriber.
Otherwise, returns InvalidOid.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-03 05:36:23 Re: Add pg_walinspect function with block info columns
Previous Message Andrey Borodin 2023-03-03 04:52:49 Re: [EXTERNAL] Re: Support load balancing in libpq