From: | Önder Kalacı <onderkalaci(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>, "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Marco Slot <marco(dot)slot(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date: | 2023-03-12 15:07:17 |
Message-ID: | CACawEhUvJEzcy8tNSkwDTacHmjaYo_2mfZ4dRr_6UcsUbf+nxA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Amit, all
>
> 1.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
> +# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY
>
> No need to use Delete test separate for this.
>
Yeah, there is really no difference between update/delete for this patch,
so it makes sense. I initially added it for completeness for the coverage,
but as it has the perf overhead for the tests, I agree that we could
drop some of those.
>
> 2.
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
> +
> +# check if the index is used even when the index has NULL values
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'test_replica_id_full_idx';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates test_replica_id_full table";
>
> Here, I think only one update is sufficient.
>
done. I guess you requested this change so that we would wait
for idx_scan=1 not idx_scan=2, which could help.
> 3.
> +$node_subscriber->safe_psql('postgres',
> + "CREATE INDEX people_last_names ON people(lastname)");
> +
> +# wait until the index is created
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
> indexrelname = 'people_last_names';}
> +) or die "Timed out while waiting for creating index people_last_names";
>
> I don't think we need this poll.
>
true, not sure why I have this. none of the tests has this anyway.
> 4.
> +# update 2 rows
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_1';");
> +$node_publisher->safe_psql('postgres',
> + "UPDATE people SET firstname = 'no-name' WHERE firstname =
> 'first_name_3' AND lastname = 'last_name_3';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates two rows via index scan with index on
> expressions and columns";
> +
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM people WHERE firstname = 'no-name';");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
> indexrelname = 'people_names';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full deletes two rows via index scan with index on
> expressions and columns";
>
> I think having one update or delete should be sufficient.
>
So, I dropped the 2nd update, but kept 1 update and 1 delete.
The latter deletes the tuple updated by the former. Seems like
an interesting test to keep.
Still, I dropped one of the extra poll_query_until, which is probably
good enough for this one? Let me know if you think otherwise.
>
> 5.
> +# update rows, moving them to other partitions
> +$node_publisher->safe_psql('postgres',
> + "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for updates on partitioned table with
> index";
> +
> +# delete rows from different partitions
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
> +$node_publisher->safe_psql('postgres',
> + "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
> +
> +# wait until the index is used on the subscriber
> +$node_subscriber->poll_query_until(
> + 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
> indexrelname ilike 'users_table_part_%';}
> +) or die "Timed out while waiting for check subscriber
> tap_sub_rep_full updates partitioned table";
> +
>
> Can we combine these two polls?
>
Looking at it closely, the first one seems like an unnecessary poll anyway.
We can simply check the idxscan at the end of the test, I don't see
value in checking earlier.
>
> 6.
> +# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS,
> ALSO
> +# DROPS COLUMN
>
> In this test, let's try to update/delete 2-3 rows instead of 20. And
> after drop columns, let's keep just one of the update or delete.
>
changed to 3 rows
>
> 7. Apart from the above, I think it is better to use
> wait_for_catchup() consistently before trying to verify the data on
> the subscriber. We always use it in other tests. I guess here you are
> relying on the poll for index scans to ensure that data is replicated
> but I feel it may still be better to use wait_for_catchup().
>
Yes, that was my understanding & expectation. I'm not convinced that
wait_for_catchup() is strictly needed, as without catching up, how could
pg_stat_all_indexes be updated? Still, it is good to be consistent
with the test suite. So, applied your suggestion.
Similarly, wait_for_subscription_sync uses the publisher name and
> appname in other tests, so it is better to be consistent. It can avoid
> random failures by ensuring data is synced.
>
makes sense.
I'll attach a new patch in the next e-mail, along with your
other comments.
Thanks,
Onder KALACI
From | Date | Subject | |
---|---|---|---|
Next Message | Önder Kalacı | 2023-03-12 15:07:21 | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Previous Message | vignesh C | 2023-03-12 14:38:23 | Implement IF NOT EXISTS for CREATE PUBLICATION AND CREATE SUBSCRIPTION |