Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: doc: improve the restriction description of using indexes on REPLICA IDENTITY FULL table.
Date: 2023-07-05 05:46:42
Message-ID: CAA4eK1+hGa53aezZFibO6i9B9wcbwd+weZt2O7DAr3EGMJyTSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 5, 2023 at 9:01 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> Hi. Here are some review comments for this patch.
>
> +1 for the patch idea.
>
> ------
>
> I wasn't sure about the code comment adjustments suggested by Amit [1]:
> "Accordingly, the comments atop build_replindex_scan_key(),
> FindUsableIndexForReplicaIdentityFull(), IsIndexOnlyOnExpression()
> should also be adjusted."
>
> Actually, I thought the FindUsableIndexForReplicaIdentityFull()
> function comment is *already* describing the limitation about the
> leftmost column (see fragment below), so IIUC the Sawada-san patch is
> only trying to expose that same information in the PG docs.
>
> [FindUsableIndexForReplicaIdentityFull comment fragment]
> * We also skip indexes if the remote relation does not contain the leftmost
> * column of the index. This is because in most such cases sequential scan is
> * favorable over index scan.
>

This implies that the leftmost column of the index must be
non-expression but I feel what the patch intends to say in docs is
more straightforward and it doesn't match what the proposed docs says.

> ~
>
> OTOH, it may be better if these limitation rule details were not
> scattered in the code. e.g. build_replindex_scan_key() function
> comment can be simplified:
>
> CURRENT:
> * This is not generic routine, it expects the idxrel to be a btree, non-partial
> * and have at least one column reference (i.e. cannot consist of only
> * expressions).
>
> SUGGESTION:
> This is not a generic routine. It expects the 'idxrel' to be an index
> deemed "usable" by the function
> FindUsableIndexForReplicaIdentityFull().
>

Note that for PK/ReplicaIdentity, we don't even call
FindUsableIndexForReplicaIdentityFull() but build_replindex_scan_key()
would still be called for such index. So, I am not sure your proposed
wording is an improvement.

> ------
> doc/src/sgml/logical-replication.sgml
>
> 1.
> the key. When replica identity <literal>FULL</literal> is specified,
> indexes can be used on the subscriber side for searching the rows.
> Candidate
> indexes must be btree, non-partial, and have at least one column reference
> - (i.e. cannot consist of only expressions). These restrictions
> - on the non-unique index properties adhere to some of the restrictions that
> - are enforced for primary keys. If there are no such suitable indexes,
> + at the leftmost column indexes (i.e. cannot consist of only
> expressions). These
> + restrictions on the non-unique index properties adhere to some of
> the restrictions
> + that are enforced for primary keys. If there are no such suitable indexes,
> the search on the subscriber side can be very inefficient, therefore
> replica identity <literal>FULL</literal> should only be used as a
> fallback if no other solution is possible. If a replica identity other
>
> Isn't this using the word "indexes" with different meanings in the
> same sentence? e.g. IIUC "leftmost column indexes" is referring to the
> ordinal number of the index fields. TBH, I am not sure the patch
> wording is even describing the limitation in quite the same way as
> what the code is actually doing.
>
> HEAD (code comment):
> * We also skip indexes if the remote relation does not contain the leftmost
> * column of the index. This is because in most such cases sequential scan is
> * favorable over index scan.
>
> HEAD (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions). These
> restrictions on the non-unique index properties adhere to some of the
> restrictions that are enforced for primary keys.
>
> PATCHED (rendered docs)
> Candidate indexes must be btree, non-partial, and have at least one
> column reference at the leftmost column indexes (i.e. cannot consist
> of only expressions). These restrictions on the non-unique index
> properties adhere to some of the restrictions that are enforced for
> primary keys.
>
> MY SUGGESTION:
> Candidate indexes must be btree, non-partial, and have at least one
> column reference (i.e. cannot consist of only expressions).
> Furthermore, the leftmost field of the candidate index must be a
> column of the published table. These restrictions on the non-unique
> index properties adhere to some of the restrictions that are enforced
> for primary keys.
>

I don't know if this suggestion is what the code is actually doing. In
function RemoteRelContainsLeftMostColumnOnIdx(), we have the following
checks:
==========
keycol = indexInfo->ii_IndexAttrNumbers[0];
if (!AttributeNumberIsValid(keycol))
return false;

if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
return false;

return attrmap->attnums[AttrNumberGetAttrOffset(keycol)] >= 0;
==========

The first of these checks indicates that the leftmost column of the
index should be non-expression, second and third indicates what you
suggest in your wording. We can also think that what you wrote in a
way is a superset of "leftmost index column is a non-expression" and
"leftmost index column should be present in remote rel" but I feel it
would be better to explicit about the first part as it is easy to
understand for users at least in docs.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-07-05 06:00:25 Re: pg_upgrade and cross-library upgrades
Previous Message Peter Eisentraut 2023-07-05 05:22:33 Re: Clean up command argument assembly