Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2022-10-11 12:44:06
Message-ID: CACawEhX0153mUsEAHxmgRDKZhq69Ygg5CGzasKgqJOTDgro4og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kuroda Hayato,

> ===
> 01. relation.c - GetCheapestReplicaIdentityFullPath
>
> ```
> * The reason that the planner would not pick partial indexes and
> indexes
> * with only expressions based on the way currently
> baserestrictinfos are
> * formed (e.g., col_1 = $1 ... AND col_N = $2).
> ```
>
> Is "col_N = $2" a typo? I think it should be "col_N = $N" or "attr1 = $1
> ... AND attrN = $N".
>
>
Yes, it is a typo, fixed now.

> ===
> 02. 032_subscribe_use_index.pl
>
> If a table has a primary key on the subscriber, it will be used even if
> enable_indexscan is false(legacy behavior).
> Should we test it?
>
>
Yes, good idea. I added two tests, one test that we cannot use regular
indexes when index scan is disabled, and another one that we use replica
identity index when index scan is disabled. This is useful to make sure if
someone changes the behavior can see the impact.

> ~~~
> 03. 032_subscribe_use_index.pl - SUBSCRIPTION RE-CALCULATES INDEX AFTER
> CREATE/DROP INDEX
>
> I think this test seems to be not trivial, so could you write down the
> motivation?
>

makes sense, done

>
> ~~~
> 04. 032_subscribe_use_index.pl - SUBSCRIPTION RE-CALCULATES INDEX AFTER
> CREATE/DROP INDEX
>
> ```
> # wait until the index is created
> $node_subscriber->poll_query_until(
> 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0
> updates one row via index";
> ```
>
> CREATE INDEX is a synchronous behavior, right? If so we don't have to wait
> here.
> ...And the comment in case of die may be wrong.
> (There are some cases like this)
>

It is not about CREATE INDEX being async. It is about pg_stat_all_indexes
being async. If we do not wait, the tests become flaky, because sometimes
the update has not been reflected in the view immediately.

This is explained here: PostgreSQL: Documentation: 14: 28.2. The Statistics
Collector <https://www.postgresql.org/docs/current/monitoring-stats.html>

*When using the statistics to monitor collected data, it is important to
> realize that the information does not update instantaneously. Each
> individual server process transmits new statistical counts to the collector
> just before going idle; so a query or transaction still in progress does
> not affect the displayed totals. Also, the collector itself emits a new
> report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless
> altered while building the server). So the displayed information lags
> behind actual activity. However, current-query information collected by
> track_activities is always up-to-date.*
>

>
> ~~~
> 05. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE
> ROWS
>
> ```
> # Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
> #
> # Basic test where the subscriber uses index
> # and touches 50 rows with UPDATE
> ```
>
> "touches 50 rows with UPDATE" -> "updates 50 rows", per other tests.
>
> fixed

> ~~~
> 06. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT
> USES AFTER ANALYZE
>
> I think this test seems to be not trivial, so could you write down the
> motivation?
> (Same as Re-calclate)
>

sure, done

>
> ~~~
> 07. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT
> USES AFTER ANALYZE
>
> ```
> # show that index_b is not used
> $node_subscriber->poll_query_until(
> 'postgres', q{select idx_scan=0 from pg_stat_all_indexes where
> indexrelname = 'index_b';}
> ) or die "Timed out while waiting for check subscriber tap_sub_rep_full
> updates two rows via index scan with index on high cardinality column-2";
> ```
>
> I think we don't have to wait here, is() should be used instead.
> poll_query_until() should be used only when idx_scan>0 is checked.
> (There are some cases like this)
>

Yes, makes sense

>
> ~~~
> 08. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ON PARTITIONED
> TABLES
>
> ```
> # make sure that the subscriber has the correct data
> $node_subscriber->poll_query_until(
> 'postgres', q{select sum(user_id+value_1+value_2)=550070 AND
> count(DISTINCT(user_id,value_1, value_2))=981 from users_table_part;}
> ) or die "ensure subscriber has the correct data at the end of the test";
> ```
>
>
Ah, for this case, we already have is() checks for the same results, this
is just a left-over from the earlier iterations

> I think we can replace it to wait_for_catchup() and is()...
> Moreover, we don't have to wait here because in above line we wait until
> the index is used on the subscriber.
> (There are some cases like this)
>

Fixed a few more such cases.

Thanks for the review! Attached v16.

Onder KALACI

Attachment Content-Type Size
v16_0001_use_index_on_subs_when_pub_rep_ident_full.patch application/x-patch 88.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message hubert depesz lubaczewski 2022-10-11 12:45:24 Re: Re: Is there any plan to support online schem change in postgresql?
Previous Message Laurenz Albe 2022-10-11 12:37:25 Make EXPLAIN generate a generic plan for a parameterized query