Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Önder Kalacı <onderkalaci(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-09 01:03:53
Message-ID: CAHut+PubGNUrT7j6mDYVHgR6pkBYGbLTPdMJr8_W2_SFOSSOGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here are my review comments for v37-0001.

======
General - should/must.

1.
In my previous review [1] (comment #1) I wrote that only some of the
"should" were misleading and gave examples where to change. But I
didn't say that *every* usage of that word was wrong, so your global
replace of "should" to "must" has modified a couple of places in
unexpected ways.

Details are in subsequent review comments below -- see #2b, #3, #5.

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres to some
of the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” must only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~~

2a.
My previous review [1] (comment #2) was not fixed quite as suggested.

Please change:
"adheres to" --> "adhere to"

~~

2b. should/must

This should/must change was OK as it was before, because here it is only advice.

Please change this back how it was:
"must only be used as a fallback" --> "should only be used as a fallback"

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

/*
* Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
* is setup to match 'rel' (*NOT* idxrel!).
*
- * Returns whether any column contains NULLs.
+ * Returns how many columns must be used for the index scan.
+ *

~

This should/must change does not seem quite right.

SUGGESTION (reworded)
Returns how many columns to use for the index scan.

~~~

4. build_replindex_scan_key

>
> Based on the discussions below, I kept as-is. I really don't want to do unrelated
> changes in this patch, as I also got several feedback for not doing it,
>

Hmm, although this code pre-existed I don’t consider this one as
"unrelated changes" because the patch introduced the new "if
(!AttributeNumberIsValid(table_attno))" which changed things. As I
wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
assignment *after* the potential 'continue' otherwise there is every
chance that the assignment is just redundant. And if you move the
assignment where it belongs, then you might as well declare the
variable in the more appropriate place at the same time – i.e. with
'opfamily' declaration. Anyway, I've given my reason a couple of times
now, so if you don't want to change it I won't about it debate
anymore.

======
src/backend/replication/logical/relation.c

5. FindUsableIndexForReplicaIdentityFull

+ * XXX: There are no fundamental problems for supporting non-btree indexes.
+ * We mostly need to relax the limitations in RelationFindReplTupleByIndex().
+ * For partial indexes, the required changes are likely to be larger. If
+ * none of the tuples satisfy the expression for the index scan, we must
+ * fall-back to sequential execution, which might not be a good idea in some
+ * cases.

The should/must change (the one in the XXX comment) does not seem quite right.

SUGGESTION
"we must fall-back to sequential execution" --> "we fallback to
sequential execution"

======
.../subscription/t/032_subscribe_use_index.pl

FYI, I get TAP test in error (Note - this is when only patch 0001 is appied)

t/031_column_list.pl ............... ok
t/032_subscribe_use_index.pl ....... 19/?
# Failed test 'ensure subscriber has not used index with
enable_indexscan=false'
# at t/032_subscribe_use_index.pl line 806.
# got: '1'
# expected: '0'
t/032_subscribe_use_index.pl ....... 21/? # Looks like you failed 1 test of 22.
t/032_subscribe_use_index.pl ....... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/22 subtests
t/100_bugs.pl ...................... ok

AFAICT there is a test case that is testing the patch 0002
functionality even when patch 0002 is not applied yet.

------
[1] Replies to my review v35 -
https://www.postgresql.org/message-id/CACawEhXnTQyGNCXeQGhN3_%2BGWujhS3MyY27C4sSqRvZ%2B_B7FLg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPvLvDGFzk4fSaevGY5h2PpAeSZjJjje_7vBdb7ag%3DzswA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-09 01:15:39 Re: Add pg_walinspect function with block info columns
Previous Message Thomas Munro 2023-03-09 01:02:46 Re: Add error functions: erf() and erfc()