Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-10 10:58:25
Message-ID: CACawEhW1q2Qx53JFDoObccP9d4HuTJxb6dA+b4qmHnJjFh68hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Peter, all

> src/backend/replication/logical/relation.c
>
> 1. FindUsableIndexForReplicaIdentityFull
>
> +static Oid
> +FindUsableIndexForReplicaIdentityFull(Relation localrel)
> +{
> + List *indexlist = RelationGetIndexList(localrel);
> + Oid usableIndex = InvalidOid;
> + ListCell *lc;
> +
> + foreach(lc, indexlist)
> + {
> + Oid idxoid = lfirst_oid(lc);
> + bool isUsableIndex;
> + Relation indexRelation = index_open(idxoid, AccessShareLock);
> + IndexInfo *indexInfo = BuildIndexInfo(indexRelation);
> +
> + isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
> +
> + index_close(indexRelation, AccessShareLock);
> +
> + if (isUsableIndex)
> + {
> + /* we found one eligible index, don't need to continue */
> + usableIndex = idxoid;
> + break;
> + }
> + }
> +
> + return usableIndex;
> +}
>
> This comment is not functional -- if you prefer the code as-is, then
> ignore this comment.
>
> But, personally I would:
> - Move some of that code from the declarations. I feel it would be
> better if the index_open/index_close were both in the code-body
> instead of half in declarations and half not.
> - Remove the 'usableIndex' variable, and just return directly.
> - Shorten all the long names (and use consistent 'idx' instead of
> sometimes 'idx' and sometimes 'index')
>
> SUGGESTION (YMMV)
>
> static Oid
> FindUsableIndexForReplicaIdentityFull(Relation localrel)
> {
> List *idxlist = RelationGetIndexList(localrel);
> ListCell *lc;
>
> foreach(lc, idxlist)
> {
> Oid idxoid = lfirst_oid(lc);
> bool isUsableIdx;
> Relation idxRel;
> IndexInfo *idxInfo;
>
> idxRel = index_open(idxoid, AccessShareLock);
> idxInfo = BuildIndexInfo(idxRel);
> isUsableIdx = IsIndexUsableForReplicaIdentityFull(idxInfo);
> index_close(idxRel, AccessShareLock);
>
> /* Return the first eligible index found */
> if (isUsableIdx)
> return idxoid;
> }
>
> return InvalidOid;
> }
>

applied your suggestion. I think it made it slightly easier to follow.

>
> ======
> .../subscription/t/032_subscribe_use_index.pl
>
> 2. SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
>
> 2a.
> # Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
> #
> # This test ensures that after CREATE INDEX, the subscriber can
> automatically
> # use one of the indexes (provided that it fulfils the requirements).
> # Similarly, after DROP index, the subscriber can automatically switch to
> # sequential scan
>
>
> The last sentence is missing full-stop.
>
>
fixed

> ~
>
> 2b.
> # now, create index and see that the index is used
> $node_subscriber->safe_psql('postgres',
> "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
>
> Don't say "and see that the index is used". Yes, that is what this
> whole test is doing, but it is not what the psql following this
> comment is doing.
>

true

> ~
>
> 2c.
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
> $node_publisher->wait_for_catchup($appname);
>
>
> # wait until the index is used on the subscriber
>
> The double blank lines here should be single.
>
> ~
>

fixed,

>
> 2d.
> # now, the update could either use the test_replica_id_full_idx or
> # test_replica_id_full_idy index it is not possible for user to control
> # which index to use
>
> This sentence should break at "it".
>
> Aso "user" --> "the user"

SUGGESTION
> # now, the update could either use the test_replica_id_full_idx or
> # test_replica_id_full_idy index; it is not possible for the user to
> control
> # which index to use
>
>
looks good, thanks

> ~
>
> 2e.
> # let's also test dropping test_replica_id_full_idy and
> # hence use test_replica_id_full_idx
>
>
> I think you ought to have dropped the other (first) index because we
> already know that the first index had been used (from earlier), but we
> are not 100% sure if the 'y' index has been chosen yet.
>

make sense. Though in general it is hard to check pg_stat_all_indexes
for any of the indexes on this test, as we don't know the exact number
of tuples for each. Just wanted to explicitly note

> ~~~~
>
> 3. SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
>
> 3a.
> # deletes 20 rows
> $node_publisher->safe_psql('postgres',
> "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
>
> # updates 20 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = 100, y = '200' WHERE x IN (1,
> 2);");
>
>
> "deletes" --> "delete"
>
> "updates" --> "update"
>

Well, I thought the command *deletes* but I guess delete looks better

>
> ~~~
>
> 4. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS
>
> # updates 200 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");
>
>
> "updates" --> "update"
>
> "200 rows" ??? is that right -- 20 maybe ???
>
>
I guess this is from an earlier version of the patch, I fixed these types
of errors.

> ~~~
>
> 5. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES
>
> 5a.
> # updates rows and moves between partitions
> $node_publisher->safe_psql('postgres',
> "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
>
> "updates rows and moves between partitions" --> "update rows, moving
> them to other partitions"
>
>
fixed, thanks

> ~
>
> 5b.
> # deletes rows from different partitions
>
>
> "deletes" --> "delete"
>
>
fixed, and searched for similar errors but couldn't see any more

~~~

> 6. SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS OR PARTIAL INDEX
>
> 6a.
> # update 2 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE people SET firstname = 'Nan' WHERE firstname =
> 'first_name_1';");
> $node_publisher->safe_psql('postgres',
> "UPDATE people SET firstname = 'Nan' WHERE firstname =
> 'first_name_2' AND lastname = 'last_name_2';");
>
> IMO 'Nan' seemed a curious name to assign as a test value, because it
> seems like it has a special meaning but in reality, I don't think it
> does. Even 'xxx' would be better.
>

changed to "no-name" as "xxx" also looks not so good

>
> ~
>
> 6b.
> # make sure none of the indexes is not used on the subscriber
> $result = $node_subscriber->safe_psql('postgres',
> "select sum(idx_scan) from pg_stat_all_indexes where indexrelname
> IN ('people_names_expr_only', 'people_names_partial')");
> is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two
> rows via seq. scan with index on expressions');
>
> ~
>
> Looks like a typo in this comment: "none of the indexes is not used" ???
>
> dropped "not"

> ~~~
>
> 7. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS
>
> 7a.
> # update 2 rows
> $node_publisher->safe_psql('postgres',
> "UPDATE people SET firstname = 'Nan' WHERE firstname =
> 'first_name_1';");
> $node_publisher->safe_psql('postgres',
> "UPDATE people SET firstname = 'Nan' WHERE firstname =
> 'first_name_3' AND lastname = 'last_name_3';");
>
> Same as #6a, 'Nan' seems like a strange test value to assign to a name.
>
>
similarly, changed to no-name

> ~~~
>
> 8. Some NULL values
>
> $node_publisher->safe_psql('postgres',
> "INSERT INTO test_replica_id_full VALUES (1), (2), (3);");
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
> $node_publisher->safe_psql('postgres',
> "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
>
> ~
>
> 8a.
> For some reason, this insert/update psql was not commented.
>
>
added some

> ~
>
> 8b.
> Maybe this test data could be more obvious by explicitly inserting the
> NULLs?
>
>
Well, that's a bit hard. We merged a few tests for perf reasons. And, I
merged this
test with "missing column" test. Now, the NULL values are triggered due to
missing column on the source.

> ~~~
>
> 9. Unique index that is not primary key or replica identity
>
> 9a.
> Why are other "Testcase start" comments all uppercase but not this one?
>
>
fixed, there was one more

> ~~~
>
> 10. SUBSCRIPTION USES INDEX WITH PUB/SUB different data
>
> Why is there a mixed case in this "Test case:" comment?
>

no specific reason, fixed

>
> ~~~
>
> 11. PUBLICATION LACKS THE COLUMN ON THE SUBS INDEX
>
> 11a.
> # The subsriber has an additional column compared to publisher,
> # and the index is on that column. We still pick the index scan
> # on the subscriber even though it is practically similar to
> # sequential scan
>
> Typo "subsriber"
>

I guess I fixed this in a recent iteration, I cannot find it.

>
> Missing full-stop on last sentence.
>

Similarly, probably merged into another test.

Still went over the all such explanations in the test, and ensured
we have the full stop

>
> ~
>
> 11b.
> # make sure that the subscriber has the correct data
> # we only deleted 1 row
> $result = $node_subscriber->safe_psql('postgres',
> "SELECT sum(x) FROM test_replica_id_full");
> is($result, qq(232), 'ensure subscriber has the correct data at the
> end of the test');
>
>
> Why does that say "deleted 1 row", when the previous operation was not a
> DELETE?
>
>
Probably due to merging multiple tests into one. Fixed now.

Again, thanks for thorough review. Attached v41.

See the reason for 0001 patch in [1].

Onder KALACI.

[1]
https://www.postgresql.org/message-id/CACawEhUu6S8E4Oo7%2Bs5iaq%3DyLRZJb6uOZeEQSGJj-7NVkDzSaw%40mail.gmail.com

Attachment Content-Type Size
v41_0002_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 55.4 KB
v41_0001_Skip-dropped_columns_for_tuples_equal.patch application/octet-stream 1.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-03-10 11:15:06 Re: Combine pg_walinspect till_end_of_wal functions with others
Previous Message vignesh C 2023-03-10 10:28:30 Re: Support logical replication of DDLs