Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-03 07:39:55
Message-ID: CACawEhVGnX_UBGMjn=rtZAgi03mWBujVFjukLhvFbk4Zjvi+dQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Peter, all

>
> ======
> Commit Message
>
> 1.
> There is no smart mechanism to pick the index. Instead, we choose
> the first index that fulfils the requirements mentioned above.
>
> ~
>
> 1a.
> I think this paragraph should immediately follow the earlier one
> ("With this patch...") which talked about the index requirements.
>
>
makes sense

>
> 1b.
> Slight rewording
>
> SUGGESTION
> If there is more than one index that satisfies these requirements, we
> just pick the first one.
>
>
applied

> ======
> doc/src/sgml/logical-replication.sgml
>
> 2.
> A published table must have a “replica identity” configured in order
> to be able to replicate UPDATE and DELETE operations, so that
> appropriate rows to update or delete can be identified on the
> subscriber side. By default, this is the primary key, if there is one.
> Another unique index (with certain additional requirements) can also
> be set to be the replica identity. When replica identity FULL is
> specified, indexes can be used on the subscriber side for searching
> the rows. These indexes should be btree, non-partial and have at least
> one column reference (e.g., should not consist of only expressions).
> These restrictions on the non-unique index properties are in essence
> the same restrictions that are enforced for primary keys. Internally,
> we follow the same approach for supporting index scans within logical
> replication scope. If there are no such suitable indexes, the search
> on the subscriber s ide can be very inefficient, therefore replica
> identity FULL should only be used as a fallback if no other solution
> is possible. If a replica identity other than “full” is set on the
> publisher side, a replica identity comprising the same or fewer
> columns must also be set on the subscriber side. See REPLICA IDENTITY
> for details on how to set the replica identity. If a table without a
> replica identity is added to a publication that replicates UPDATE or
> DELETE operations then subsequent UPDATE or DELETE operations will
> cause an error on the publisher. INSERT operations can proceed
> regardless of any replica identity.
>
> ~
>
> 2a.
> IMO the <quote>replica identity</quote> in the first sentence should
> be changed to be <firstterm>replica identity</firstterm>
>

>
> ~
>
> 2b.
> Typo: "subscriber s ide" --> "subscriber side"
>

fixed

> 2c.
> There is still one remaining "full" in this text. I think ought to be
> changed to <literal>FULL</literal> to match the others.
>
>
changed

> ======
> src/backend/executor/execReplication.c
>
> 3. IdxIsRelationIdentityOrPK
>
> +/*
> + * Given a relation and OID of an index, returns true if the
> + * index is relation's primary key's index or relation's
> + * replica identity index.
> + *
> + * Returns false otherwise.
> + */
> +bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + return RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid;
> }
>
> ~
>
> Since the function name mentions RI (1st) and then PK (2nd), and since
> the implementation also has the same order, I think the function
> comment should use the same consistent order when describing what it
> does.
>

alright, done

>
> ======
> src/backend/replication/logical/relation.c
>
> 4. FindUsableIndexForReplicaIdentityFull
>
> +/*
> + * Returns an index oid if there is an index that can be used
> + * via the apply worker. The index should be btree, non-partial
> + * and have at least one column reference (e.g., should
> + * not consist of only expressions). The limitations arise from
> + * RelationFindReplTupleByIndex(), which is designed to handle
> + * PK/RI and these limitations are inherent to PK/RI.
> + *
> + * There are no fundamental problems for supporting non-btree
> + * and/or partial indexes. We should mostly relax the limitations
> + * in RelationFindReplTupleByIndex().
> + *
> + * If no suitable index is found, returns InvalidOid.
> + *
> + * Note that this is not a generic function, it expects REPLICA
> + * IDENTITY FULL for the remote relation.
> + */
>
> ~
>
> 4a.
> Minor rewording of 1st sentence.
>
> BEFORE
> Returns an index oid if there is an index that can be used via the apply
> worker.
>
> SUGGESTION
> Returns the oid of an index that can be used via the apply worker.
>
>
looks better, applied

>
> 4b.
> + * There are no fundamental problems for supporting non-btree
> + * and/or partial indexes. We should mostly relax the limitations
> + * in RelationFindReplTupleByIndex().
>
> I think this paragraph should come later in the comment (just before
> the Note) and should also have "XXX" prefix to indicate it is some
> implementation note for future versions.
>
>
>
done

>
> 5. GetRelationIdentityOrPK
>
> +/*
> + * Get replica identity index or if it is not defined a primary key.
> + *
> + * If neither is defined, returns InvalidOid
> + */
> +Oid
> +GetRelationIdentityOrPK(Relation rel)
> +{
> + Oid idxoid;
> +
> + idxoid = RelationGetReplicaIndex(rel);
> +
> + if (!OidIsValid(idxoid))
> + idxoid = RelationGetPrimaryKeyIndex(rel);
> +
> + return idxoid;
> +}
>
> This is really very similar code to the other new function called
> IdxIsRelationIdentityOrPK. I wondered if such similar functions could
> be defined together.
>

Makes sense, moved them closer, also changed IdxIsRelationIdentityOrPK to
rely on
GetRelationIdentityOrPK()

>
> ~~~
>
> 6. FindLogicalRepUsableIndex
>
> +/*
> + * Returns an index oid if we can use an index for subscriber. If not,
> + * returns InvalidOid.
> + */
>
> SUGGESTION
> Returns the oid of an index that can be used by a subscriber.
> Otherwise, returns InvalidOid.
>
>
applies.

Now attaching v28 of the patch, which includes the reviews from this mail
and [1].

Thanks,
Onder

[1]
https://www.postgresql.org/message-id/OS0PR01MB5716BE4954A99EAF14F4D1F294B39%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Attachment Content-Type Size
v28_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 67.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-03-03 07:58:07 Re: Minimal logical decoding on standbys
Previous Message Önder Kalacı 2023-03-03 07:39:43 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher