Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-06 04:41:54
Message-ID: CAHut+Pv6KaBUdJ5JfFPQn8=yYs3aOvbL4xbAtH+HX+i3uJ+p3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are some review comments for v28-0001.

======
doc/src/sgml/logical-replication.sgml

1.
A published table must have a replica identity configured in order to
be able to replicate UPDATE and DELETE operations, so that appropriate
rows to update or delete can be identified on the subscriber side. By
default, this is the primary key, if there is one. Another unique
index (with certain additional requirements) can also be set to be the
replica identity. When replica identity FULL is specified, indexes can
be used on the subscriber side for searching the rows. These indexes
should be btree, non-partial and have at least one column reference
(e.g., should not consist of only expressions). These restrictions on
the non-unique index properties are in essence the same restrictions
that are enforced for primary keys. Internally, we follow the same
approach for supporting index scans within logical replication scope.
If there are no such suitable indexes, the search on the subscriber
side can be very inefficient, therefore replica identity FULL should
only be used as a fallback if no other solution is possible. If a
replica identity other than full is set on the publisher side, a
replica identity comprising the same or fewer columns must also be set
on the subscriber side. See REPLICA IDENTITY for details on how to set
the replica identity. If a table without a replica identity is added
to a publication that replicates UPDATE or DELETE operations then
subsequent UPDATE or DELETE operations will cause an error on the
publisher. INSERT operations can proceed regardless of any replica
identity.

~

1a.
Changes include:
"should" --> "must"
"e.g." --> "i.e."

BEFORE
These indexes should be btree, non-partial and have at least one
column reference (e.g., should not consist of only expressions).

SUGGESTION
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e., cannot consist of only expressions).

~

1b.
The fix for my v27 review comment #2b (changing "full" to FULL) was
not made correctly. It should be uppercase FULL, not full:
"other than full" --> "other than FULL"

======
src/backend/executor/execReplication.c

2.
/*
* Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
* is setup to match 'rel' (*NOT* idxrel!).
*
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
*
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.
*/
-static bool
+static int
build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
TupleTableSlot *searchslot)

~

"(e.g., should not consist of only expressions)" --> "(i.e., cannot
consist of only expressions)"

======
src/backend/replication/logical/relation.c

3. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns the oid of an index that can be used via the apply
+ * worker. The index should be btree, non-partial and have at
+ * least one column reference (e.g., should not consist of
+ * only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.

The 2nd sentence of this comment should match the same changes in the
Commit message --- "must not" instead of "should not", "i.e." instead
of "e.g.", etc. See the review comment #1a above.

~~~

4. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's replica identity index or relation's
+ * primary key's index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return GetRelationIdentityOrPK(rel) == idxoid;
+}

I think you've "simplified" this function in v28 but AFAICT now it has
a different logic to v27.

PREVIOUSLY it was coded like
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;

You can see if 'idxoid' did NOT match RI but if it DID match PK
previously it would return true. But now in that scenario, it won't
even check the PK if there was a valid RI. So it might return false
when previously it returned true. Is it deliberate?

======
.../subscription/t/032_subscribe_use_index.pl

5.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data
+#
+# The subscriber has duplicate tuples that publisher does not have.
+# When publsher updates/deletes 1 row, subscriber uses indexes and
+# exactly updates/deletes 1 row.

"and exactly updates/deletes 1 row." --> "and updates/deletes exactly 1 row."

------
Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michał Kłeczek 2023-03-06 05:30:03 Re: Removing unneeded self joins
Previous Message Michael Paquier 2023-03-06 04:29:50 Re: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?