RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
To: "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, Önder Kalacı <onderkalaci(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-02-14 09:35:58
Message-ID: OSZPR01MB6310D139F020B8A2C84EB009FDA29@OSZPR01MB6310.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 13, 2023 7:01 PM shiy(dot)fnst(at)fujitsu(dot)com <shiy(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Thu, Feb 2, 2023 4:34 PM Önder Kalacı <onderkalaci(at)gmail(dot)com> wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
>
> Thanks for your patch. Here are some comments.
>

Hi,

Here are some comments on the test cases.

1. in test case "SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX"
+# now, ingest more data and create index on column y which has higher cardinality
+# so that the future commands use the index on column y
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");

We don't pick the cheapest index in the current patch, so should we modify this
part of the test?

BTW, the following comment in FindLogicalRepUsableIndex() need to be changed,
too.

+ * We are looking for one more opportunity for using an index. If
+ * there are any indexes defined on the local relation, try to pick
+ * the cheapest index.

2. Is there any reasons why we need the test case "SUBSCRIPTION USES INDEX WITH
DROPPED COLUMNS"? Has there been a problem related to dropped columns before?

3. in test case "SUBSCRIPTION USES INDEX ON PARTITIONED TABLES"
+# deletes rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

"moves between partitions" in the comment seems wrong.

4. in test case "SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS"
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_2' AND lastname = 'last_name_2';");
+
+# make sure the index is not used on the subscriber
+$result = $node_subscriber->safe_psql('postgres',
+ "select idx_scan from pg_stat_all_indexes where indexrelname = 'people_names'");
+is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two rows via seq. scan with index on expressions');
+

I think it would be better to call wait_for_catchup() before the check because
we want to check the index is NOT used. Otherwise the check may pass because the
rows have not yet been updated on subscriber.

5. in test case "SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN"
+# show that index is not used even when enable_indexscan=false
+$result = $node_subscriber->safe_psql('postgres',
+ "select idx_scan from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'");
+is($result, qq(0), 'ensure subscriber has not used index with enable_indexscan=false');

Should we remove the word "even" in the comment?

6.
In each test case we re-create publications, subscriptions, and tables. Could we
create only one publication and one subscription at the beginning, and use them
in all test cases? I think this can save some time running the test file.

Regards,
Shi Yu

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Lepikhov 2023-02-14 10:01:41 Re: [PoC] Reducing planning time when tables have many partitions
Previous Message Alvaro Herrera 2023-02-14 09:27:26 Re: Support logical replication of DDLs