Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Önder Kalacı <onderkalaci(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date: 2023-03-12 15:07:17
Message-ID: CACawEhUvJEzcy8tNSkwDTacHmjaYo_2mfZ4dRr_6UcsUbf+nxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Amit, all

>
> 1.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
> +# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY
>
> No need to use Delete test separate for this.
>

Yeah, there is really no difference between update/delete for this patch,
so it makes sense. I initially added it for completeness for the coverage,
but as it has the perf overhead for the tests, I agree that we could
drop some of those.

>
> 2.
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
> +
> +# check if the index is used even when the index has NULL values
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates test_replica_id_full table";
>
> Here, I think only one update is sufficient.
>

done. I guess you requested this change so that we would wait
for idx_scan=1 not idx_scan=2, which could help.

> 3.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE INDEX people_last_names ON people(lastname)");
> +
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'people_last_names';}
> +) or die "Timed out while waiting for creating index people_last_names";
>
> I don't think we need this poll.
>

true, not sure why I have this. none of the tests has this anyway.

> 4.
> +# update 2 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_1';");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_3' AND lastname = 'last_name_3';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates two rows via index scan with index on
> expressions and columns";
> +
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM people WHERE firstname = 'no-name';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full deletes two rows via index scan with index on
> expressions and columns";
>
> I think having one update or delete should be sufficient.
>

So, I dropped the 2nd update, but kept 1 update and 1 delete.
The latter deletes the tuple updated by the former. Seems like
an interesting test to keep.

Still, I dropped one of the extra poll_query_until, which is probably
good enough for this one? Let me know if you think otherwise.

>
> 5.
> +# update rows, moving them to other partitions
> +$node_publisher->safe_psql('postgres',
> + "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for updates on partitioned table with
> index";
> +
> +# delete rows from different partitions
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates partitioned table";
> +
>
> Can we combine these two polls?
>

Looking at it closely, the first one seems like an unnecessary poll anyway.
We can simply check the idxscan at the end of the test, I don't see
value in checking earlier.

>
> 6.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS,
> ALSO
> +# DROPS COLUMN
>
> In this test, let's try to update/delete 2-3 rows instead of 20. And
> after drop columns, let's keep just one of the update or delete.
>

changed to 3 rows

>
> 7. Apart from the above, I think it is better to use
> wait_for_catchup() consistently before trying to verify the data on
> the subscriber. We always use it in other tests. I guess here you are
> relying on the poll for index scans to ensure that data is replicated
> but I feel it may still be better to use wait_for_catchup().
>

Yes, that was my understanding & expectation. I'm not convinced that
wait_for_catchup() is strictly needed, as without catching up, how could
pg_stat_all_indexes be updated? Still, it is good to be consistent
with the test suite. So, applied your suggestion.

Similarly, wait_for_subscription_sync uses the publisher name and
> appname in other tests, so it is better to be consistent. It can avoid
> random failures by ensuring data is synced.
>

makes sense.

I'll attach a new patch in the next e-mail, along with your
other comments.

Thanks,
Onder KALACI

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Önder Kalacı 2023-03-12 15:07:21 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message vignesh C 2023-03-12 14:38:23 Implement IF NOT EXISTS for CREATE PUBLICATION AND CREATE SUBSCRIPTION