Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Cc: Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-02-22 14:24:03
Message-ID: CACawEhUU8MgaCCLeLmcZe_XkOYnLEg9myWbsRZTogp_R8Jx02Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Peter, Amit, Shi Yu and all,

(I'm replying multiple reviews in this single reply, hope that's fine)

On Fri, Feb 17, 2023 at 5:57 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> FYI, I accidentally left this (v23) patch's TAP test
> t/032_subscribe_use_index.pl still lurking even after removing all
> other parts of this patch.
>
> In this scenario, the t/032 test gets stuck (build of latest HEAD)
>
> IIUC the patch is only meant to affect performance, so I expected this
> 032 test to work regardless of whether the rest of the patch is
> applied.
>
> Anyway, it hangs every time for me. I didn't dig looking for the
> cause, but if it requires patched code for this new test to pass, I
> thought it indicates something wrong either with the test or something
> more sinister the new test has exposed. Maybe I am mistaken
>

>
> Sorry, probably the above was a false alarm. After a long time
> (minutes) the stuck test did eventually timeout with:
> t/032_subscribe_use_index.pl ....... # poll_query_until timed out
> executing this query:
> # select (idx_scan = 1) from pg_stat_all_indexes where indexrelname =
> 'test_replica_id_full_idx';
> # expecting this output:
> # t
> # last actual query output:
> # f
> # with stderr:
> t/032_subscribe_use_index.pl ....... Dubious, test returned 29 (wstat
> 7424, 0x1d00)
>
>
I can tell that this is the expected behavior. Majority of the tests do the
following:
- update/delete row on the source
- check pg_stat_all_indexes on the target

So, given that HEAD does not use any indexes, it is expected that the tests
would
wait until poll_query_until timeout. That's why, I do not see/expect any
problems on
HEAD. I run the test file by removing the poll_query_until for the index
scan counts,
and all finished properly.

I think such a case exists. I tried the following cases based on v23 patch.

As I noted in the earlier reply, I created another patch, which optionally
gives the ability to
disable index scans on the subscription level for the replica identity
full case.

That is the second patch attached in this mail,
named v25_0001_optionally_disable_index.patch.

Here are some review comments for patch v23.

Thanks Peter, see the following reply:

======
> General

> 1.
> IIUC the previous logic for checking "cost" comparisons and selecting
> the "cheapest" strategy is no longer present in the latest patch.

In that case, I think there are some leftover stale comments that need

changing. For example,

1a. Commit message:
> "let the planner sub-modules compare the costs of index versus
> sequential scan and choose the cheapest."
> 1b. Commit message:
> "Finally, pick the cheapest `Path` among."
> 1c. FindLogicalRepUsableIndex function:
> + * We are looking for one more opportunity for using an index. If
> + * there are any indexes defined on the local relation, try to pick
> + * the cheapest index.

Makes sense, the commit message and function messages should reflect
the new logic. I went over the patch with a detailed look for this.

======
> doc/src/sgml/logical-replication.sgml
> If replica identity "full" is used, indexes can be used on the
> subscriber side for seaching the rows. The index should be btree,
> non-partial and have at least one column reference (e.g., should not
> consists of only expressions). If there are no suitable indexes, the
> search on the subscriber side is very inefficient and should only be
> used as a fallback if no other solution is possible

2a.
> Fixed typo "seaching", and minor rewording.

SUGGESTION
> When replica identity "full" is specified, indexes can be used on the
> subscriber side for searching the rows. These indexes should be btree,
> non-partial and have at least one column reference (e.g., should not
> consist of only expressions). If there are no such suitable indexes,
> the search on the subscriber side can be very inefficient, therefore
> replica identity "full" should only be used as a fallback if no other
> solution is possible.

I like your suggestion, updated

2b.
> I know you are just following some existing text here, but IMO this
> should probably refer to replica identity <literal>FULL</literal>
> instead of "full".

I guess that works, I don't have any preference / knowledge on this.

>
> ======
> src/backend/executor/execReplication.c
> 3. IdxIsRelationIdentityOrPK
> +/*
> + * Given a relation and OID of an index, returns true if
> + * the index is relation's primary key's index or
> + * relaton's replica identity index.
> + *
> + * Returns false otherwise.
> + */
> +static bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + if (RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid)
> + return true;
> +
> + return false;
> 3a.
> Comment typo "relaton"

fixed

3b.
> Code could be written using single like below if you wish (but see #2c)
> return RelationGetReplicaIndex(rel) == idxoid ||
> RelationGetPrimaryKeyIndex(rel) == idxoid;
> ~
> 3c.
> Actually, RelationGetReplicaIndex and RelationGetPrimaryKeyIndex
> implementations are very similar so it seemed inefficient to be
> calling both of them. IMO it might be better to just make a new
> relcache function IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid).
> This implementation will be similar to those others, but now you need
> only to call the workhorse RelationGetIndexList *one* time.
> ~~~

Regarding (3c), RelationGetIndexList is only called once, when
!relation->rd_indexvalid.
So, there seems not to be a necessary case for merging two functions to me.
Also, I'd rather keep both functions, as I know that some extensions rely
on these functions separately.

Regarding (3b), it makes sense, applied.

> 4. RelationFindReplTupleByIndex
> bool found;
> + TypeCacheEntry **eq = NULL; /* only used when the index is not repl.
> ident
> + * or pkey */
> + bool idxIsRelationIdentityOrPK;
> If you change the comment to say "RI" instead of "repl. Ident" then it
> can all fit on one line, which would be an improvement.

Done, also changed pkey to PK as this seems to be used throughout the code.

======
> src/backend/replication/logical/relation.c
> 5.
> #include "replication/logicalrelation.h"
> #include "replication/worker_internal.h"
> +#include "optimizer/cost.h"
> #include "utils/inval.h"
> Can that #include be added in alphabetical order like the others or not?

Sure, it seems like I intended to do it, but made a small mistake :)

6. logicalrep_partition_open
> + /*
> + * Finding a usable index is an infrequent task. It occurs when an
> + * operation is first performed on the relation, or after invalidation of
> + * of the relation cache entry (such as ANALYZE or CREATE/DROP index on
> + * the relation).
> + */
> + entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
> +
> Typo "of of the relation"

Fixed

7. FindUsableIndexForReplicaIdentityFull
> +static Oid
> +FindUsableIndexForReplicaIdentityFull(Relation localrel)
> +{
> + MemoryContext usableIndexContext;
> + MemoryContext oldctx;
> + Oid usableIndex;
> + Oid idxoid;
> + List *indexlist;
> + ListCell *lc;
> + Relation indexRelation;
> + IndexInfo *indexInfo;
> + bool is_btree;
> + bool is_partial;
> + bool is_only_on_expression;
> It looks like some of these variables are only used within the scope
> of the foreach loop, so I think that is where they should be declared.

makes sense, done

8.
> + usableIndex = InvalidOid;
> Might as well do that assignment at the declaration.

done

9. FindLogicalRepUsableIndex
> + /*
> + * Simple case, we already have a primary key or a replica identity index.
> + *
> + * Note that we do not use index scans below when enable_indexscan is
> + * false. Allowing primary key or replica identity even when index scan is
> + * disabled is the legacy behaviour. So we hesitate to move the below
> + * enable_indexscan check to be done earlier in this function.
> + */
> + idxoid = GetRelationIdentityOrPK(localrel);
> + if (OidIsValid(idxoid))
> + return idxoid;
> +
> + /* If index scans are disabled, use a sequential scan */
> + if (!enable_indexscan)
> + return InvalidOid;
> IMO that "Note" really belongs with the if (!enable)indexscan) more like
> this:
> SUGGESTION
> /*
> * Simple case, we already have a primary key or a replica identity index.
> */
> idxoid = GetRelationIdentityOrPK(localrel);
> if (OidIsValid(idxoid))
> return idxoid;
> /*
> * If index scans are disabled, use a sequential scan.
> *
> * Note we hesitate to move this check to earlier in this function
> * because allowing primary key or replica identity even when index scan
> * is disabled is the legacy behaviour.
> */
> if (!enable_indexscan)
> return InvalidOid;

makes sense, moved

> ======
> src/backend/replication/logical/worker.c
> 10. get_usable_indexoid
> +/*
> + * Decide whether we can pick an index for the relinfo (e.g., the
> relation)
> + * we're actually deleting/updating from. If it is a child partition of
> + * edata->targetRelInfo, find the index on the partition.
> + *
> + * Note that if the corresponding relmapentry has invalid usableIndexOid,
> + * the function returns InvalidOid.
> + */
> "(e.g., the relation)" --> "(i.e. the relation)"

fixed

Thanks for your patch. Here are some comments.

Thanks Shi Yu, see my reply below

1.
> I noticed that get_usable_indexoid() is called in
> apply_handle_update_internal()
> and apply_handle_delete_internal() to get the usable index. Could
> usableIndexOid
> be a parameter of these two functions? Because we have got the
> LogicalRepRelMapEntry when calling them and if we do so, we can get
> usableIndexOid without get_usable_indexoid(). Otherwise for partitioned
> tables,
> logicalrep_partition_open() is called in get_usable_indexoid() and
> searching
> the entry via hash_search() will increase cost.

I think I cannot easily follow this comment. We call
logicalrep_partition_open()
because if an update/delete is on a partitioned table, we should find the
corresponding local index on the partition itself. edata->targetRel points
to the
partitioned table, and we map it to the partition inside
get_usable_indexoid().

Overall, I cannot see how we can avoid the call
to logicalrep_partition_open().
Can you please explain a little further?

Note that logicalrep_partition_open() is cheap for the cases where there is
no
invalidations (which is probably most of the time)

2.
> + * This attribute is an expression, and
> + * SuitableIndexPathsForRepIdentFull() was called
> earlier when the
> + * index for subscriber was selected. There, the
> indexes
> + * comprising *only* expressions have already been
> eliminated.

The comment looks need to be updated:
> SuitableIndexPathsForRepIdentFull
> ->
> FindUsableIndexForReplicaIdentityFull

Yes, updated.

3.
> /* Build scankey for every attribute in the index. */
> - for (attoff = 0; attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
> + for (index_attoff = 0; index_attoff <
> IndexRelationGetNumberOfKeyAttributes(idxrel);
> + index_attoff++)
> {
> Should the comment be changed? Because we skip the attributes that are
> expressions.

makes sense

4.
> + Assert(RelationGetReplicaIndex(rel) !=
> RelationGetRelid(idxrel) &&
> + RelationGetPrimaryKeyIndex(rel) !=
> RelationGetRelid(idxrel));
> Maybe we can call the new function idxIsRelationIdentityOrPK()?

Makes sense, becomes easier to understand.

Here are some comments on the test cases.

1. in test case "SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX"
> +# now, ingest more data and create index on column y which has higher
> cardinality
> +# so that the future commands use the index on column y
> +$node_publisher->safe_psql('postgres',
> + "INSERT INTO test_replica_id_full SELECT 50, i FROM
> generate_series(0,3100)i;");
> +$node_subscriber->safe_psql('postgres',
> + "CREATE INDEX test_replica_id_full_idy ON
> test_replica_id_full(y)");
> We don't pick the cheapest index in the current patch, so should we modify
> this
> part of the test?

I think I already changed that test. I kept the test so that we still make
sure that even if we
create/drop indexes, we do not mess anything. I agree that the wording /
comments were
stale.

Can you check if it looks better now?

> BTW, the following comment in FindLogicalRepUsableIndex() need to be
> changed,
> too.
> + * We are looking for one more opportunity for using an
> index. If
> + * there are any indexes defined on the local relation,
> try to pick
> + * the cheapest index.

makes sense, Peter also had a similar comment, fixed.

2. Is there any reasons why we need the test case "SUBSCRIPTION USES INDEX
> WITH
> DROPPED COLUMNS"? Has there been a problem related to dropped columns
> before?

Not really, but dropped columns are tricky in general. As far as I know,
those columns
continue to exist in pg_attribute, which might cause some edge cases. So, I
wanted to
have coverage for that.

> 3. in test case "SUBSCRIPTION USES INDEX ON PARTITIONED TABLES"
> +# deletes rows and moves between partitions
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 12 and value_1 =
> 12;");
> "moves between partitions" in the comment seems wrong.

Yes, probably copy & paste error from the UPDATE test

4. in test case "SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS"
> +# update 2 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'Nan' WHERE firstname =
> 'first_name_1';");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'Nan' WHERE firstname =
> 'first_name_2' AND lastname = 'last_name_2';");
> +
> +# make sure the index is not used on the subscriber
> +$result = $node_subscriber->safe_psql('postgres',
> + "select idx_scan from pg_stat_all_indexes where indexrelname =
> 'people_names'");
> +is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two rows
> via seq. scan with index on expressions');
> +
> I think it would be better to call wait_for_catchup() before the check
> because
> we want to check the index is NOT used. Otherwise the check may pass
> because the
> rows have not yet been updated on subscriber.

that's right, added

5. in test case "SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN"
> +# show that index is not used even when enable_indexscan=false
> +$result = $node_subscriber->safe_psql('postgres',
> + "select idx_scan from pg_stat_all_indexes where indexrelname =
> 'test_replica_id_full_idx'");
> +is($result, qq(0), 'ensure subscriber has not used index with
> enable_indexscan=false');
> Should we remove the word "even" in the comment?

done

6.
> In each test case we re-create publications, subscriptions, and tables.
> Could we
> create only one publication and one subscription at the beginning, and use
> them
> in all test cases? I think this can save some time running the test file.

I'd rather keep as-is for (a) simplicity (b) other test files seem to have
similar patterns.

Do you think strongly that we should change the test file? It could make
debugging
the tests harder as well.

Tom, Andres, or others, do you have any suggestions on how to move
> forward with this patch?

Yes, happy to hear any feedback for the attached patch(es).

Thanks,
Onder KALACI

Önder Kalacı <onderkalaci(at)gmail(dot)com>, 21 Şub 2023 Sal, 17:25 tarihinde şunu
yazdı:

> Hi Amit, all
>
> Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 15 Şub 2023 Çar, 07:37 tarihinde
> şunu yazdı:
>
>> On Wed, Feb 15, 2023 at 9:23 AM shiy(dot)fnst(at)fujitsu(dot)com
>> <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
>> >
>> > On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
>> wrote:
>> > >
>> > > On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci(at)gmail(dot)com>
>> wrote:
>> > > >
>> > > > On v23, I dropped the planner support for picking the index.
>> Instead, it simply
>> > > > iterates over the indexes and picks the first one that is suitable.
>> > > >
>> > > > I'm currently thinking on how to enable users to override this
>> decision.
>> > > > One option I'm leaning towards is to add a syntax like the
>> following:
>> > > >
>> > > > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
>> > > >
>> > > > Though, that should probably be a seperate patch. I'm going to work
>> > > > on that, but still wanted to share v23 given picking the index
>> sounds
>> > > > complementary, not strictly required at this point.
>> > > >
>> > >
>> > > I agree that it could be a separate patch. However, do you think we
>> > > need some way to disable picking the index scan? This is to avoid
>> > > cases where sequence scan could be better or do we think there won't
>> > > exist such a case?
>> > >
>> >
>> > I think such a case exists. I tried the following cases based on v23
>> patch.
>> >
>> ...
>> > # Result
>> > The time executing update (the average of 3 runs is taken, the unit is
>> > milliseconds):
>> >
>> > +--------+---------+---------+
>> > | | patched | master |
>> > +--------+---------+---------+
>> > | case 1 | 3933.68 | 1298.32 |
>> > | case 2 | 1803.46 | 1294.42 |
>> > | case 3 | 1380.82 | 1299.90 |
>> > | case 4 | 1042.60 | 1300.20 |
>> > | case 5 | 691.69 | 1297.51 |
>> > | case 6 | 578.50 | 1300.69 |
>> > | case 7 | 566.45 | 1302.17 |
>> > +--------+---------+---------+
>> >
>> > In case 1~3, there's an overhead after applying the patch. In other
>> cases, the
>> > patch improved the performance. As more duplicate values, the greater
>> the
>> > overhead after applying the patch.
>> >
>>
>> I think this overhead seems to be mostly due to the need to perform
>> tuples_equal multiple times for duplicate values. I don't know if
>> there is any simple way to avoid this without using the planner stuff
>> as was used in the previous approach. So, this brings us to the
>> question of whether just providing a way to disable/enable the use of
>> index scan for such cases is sufficient or if we need any other way.
>>
>> Tom, Andres, or others, do you have any suggestions on how to move
>> forward with this patch?
>>
>>
> Thanks for the feedback and testing. Due to personal circumstances,
> I could not reply the thread in the last 2 weeks, but I'll be more active
> going forward.
>
> I also agree that we should have a way to control the behavior.
>
> I created another patch (v24_0001_optionally_disable_index.patch) which
> can be applied
> on top of v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch.
>
> The new patch adds a new *subscription_parameter* for both CREATE and
> ALTER subscription
> named: *enable_index_scan*. The setting is valid only when REPLICA
> IDENTITY is full.
>
> What do you think about such a patch to control the behavior? It does not
> give a per-relation
> level of control, but still useful for many cases.
>
> (Note that I'll be working on the other feedback in the email thread,
> wanted to send this earlier
> to hear some early thoughts on v24_0001_optionally_disable_index.patch).
>
>
>

Attachment Content-Type Size
v25_0001_optionally_disable_index.patch application/octet-stream 51.3 KB
v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/octet-stream 64.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2023-02-22 14:29:29 Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)
Previous Message Daniel Gustafsson 2023-02-22 14:10:11 Re: pg_regress: Treat child process failure as test failure