Re: Replication slot stats misgivings

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-14 01:50:09
Message-ID: CALDaNm1ZkBc=ycQyi=mavZK0CUg-iQPXUc7mxszNvk7L5+7TKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 13, 2021 at 10:46 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
wrote:
>
> On Mon, Apr 12, 2021 at 9:16 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, Apr 12, 2021 at 4:46 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > >
> > > On Sat, Apr 10, 2021 at 6:51 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > > >
> > >
> > > Thanks, 0001 and 0002 look good to me. I have a minor comment for
0002.
> > >
> > > <entry role="catalog_table_entry"><para role="column_definition">
> > > + <structfield>total_bytes</structfield><type>bigint</type>
> > > + </para>
> > > + <para>
> > > + Amount of decoded transactions data sent to the decoding
output plugin
> > > + while decoding the changes from WAL for this slot. This can
be used to
> > > + gauge the total amount of data sent during logical decoding.
> > >
> > > Can we slightly extend it to say something like: Note that this
> > > includes the bytes streamed and or spilled. Similarly, we can extend
> > > it for total_txns.
> > >
> >
> > Thanks for the comments, the comments are fixed in the v8 patch
attached.
> > Thoughts?
>
> Here are review comments on new TAP tests:

Thanks for the comments.

> +# Create replication slots.
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot1',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot2',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot3',
> 'test_decoding')");
> +$node->safe_psql('postgres',
> + "SELECT 'init' FROM
> pg_create_logical_replication_slot('regression_slot4',
> 'test_decoding')");
>
> and
>
> +
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can do those similar queries in a single psql connection
> like follows:
>
> # Create replication slots.
> $node->safe_psql('postgres',
> qq[
> SELECT pg_create_logical_replication_slot('regression_slot1',
'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot2',
'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot3',
'test_decoding');
> SELECT pg_create_logical_replication_slot('regression_slot4',
'test_decoding');
> ]);
>
> and
>
> $node->safe_psql('postgres',
> qq[
> SELECT data FROM pg_logical_slot_get_changes('regression_slot1', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot2', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot3', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> SELECT data FROM pg_logical_slot_get_changes('regression_slot4', NULL,
> NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> ]);
>

Modified.

> ---
> +# Wait for the statistics to be updated.
> +my $slot1_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot1' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot2_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot2' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot3_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot3' AND total_txns > 0 AND total_bytes > 0;";
> +my $slot4_stat_check_query =
> + "SELECT count(1) = 1 FROM pg_stat_replication_slots WHERE slot_name
> = 'regression_slot4' AND total_txns > 0 AND total_bytes > 0;";
> +
> +# Verify that the statistics have been updated.
> +$node->poll_query_until('postgres', $slot1_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot2_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot3_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
> +$node->poll_query_until('postgres', $slot4_stat_check_query)
> + or die "Timed out while waiting for statistics to be updated";
>
> We can simplify the above code to something like:
>
> $node->poll_query_until(
> 'postgres', qq[
> SELECT count(slot_name) >= 4
> FROM pg_stat_replication_slots
> WHERE slot_name ~ 'regression_slot'
> AND total_txns > 0
> AND total_bytes > 0;
> ]) or die "Timed out while waiting for statistics to be updated";
>

Modified.

> ---
> +# Test to remove one of the replication slots and adjust
max_replication_slots
> +# accordingly to the number of slots and verify replication statistics
data is
> +# fine after restart.
>
> I think it's better if we explain in detail what cases we're trying to
> test. How about the following description?
>
> Test to remove one of the replication slots and adjust
> max_replication_slots accordingly to the number of slots. This leads
> to a mismatch of the number of slots between in the stats file and on
> shared memory, simulating the message for dropping a slot got lost. We
> verify replication statistics data is fine after restart.

Slightly reworded and modified it.

These comments are fixed as part of the v9 patch posted at [1].
[1] -
https://www.postgresql.org/message-id/CALDaNm3CtPUYkFjPhzX0AcuRiK2MzdCR%2B_w8ok1kCcykveuL2Q%40mail.gmail.com

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-04-14 02:08:27 Re: Converting contrib SQL functions to new style
Previous Message Michael Paquier 2021-04-14 01:38:17 Re: Extensions not dumped when --schema is used