Re: Replication slot stats misgivings

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-06 11:58:25
Message-ID: CALDaNm3jhjd5P8-w=AKRORa0Q_A1sprG8Qk7A5-0GaT1J7dZSw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 6, 2021 at 12:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
>
> Few comments on the latest patches:
> Comments on 0001
> --------------------------------
> 1.
> @@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
> TransactionId xid, bool create,
> dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
> AssertTXNLsnOrder(rb);
> }
> +
> + rb->totalTxns++;
> }
> else
> txn = NULL; /* not found and not asked to create */
> @@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
> {
> txn->size += sz;
> rb->size += sz;
> + rb->totalBytes += sz;
>
> I think this will include the txns that are aborted and for which we
> don't send anything. It might be better to update these stats in
> ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
> have sent the data. We can probably use size/total_size in txn. We
> need to be careful to not double include the totaltxn or totalBytes
> for streaming xacts as we might process the same txn multiple times.
>
> 2.
> + Amount of decoded transactions data sent to the decoding output plugin
> + while decoding the changes from WAL for this slot. This and total_txns
> + for this slot can be used to gauge the total amount of data during
> + logical decoding.
>
> I think we can slightly modify the second line here: "This can be used
> to gauge the total amount of data sent during logical decoding.". Why
> we need to include total_txns along with it.
>
> 0002
> ----------
> 3.
> + -- we don't want to wait forever; loop will exit after 30 seconds
> + FOR i IN 1 .. 5 LOOP
> +
> ...
> ...
> +
> + -- wait a little
> + perform pg_sleep_for('100 milliseconds');
>
> I think this loop needs to be executed 300 times instead of 5 times,
> if the above comments and code needs to do what is expected here?
>
>
> 4.
> +# Test to drop one of the subscribers and verify replication statistics data is
> +# fine after publisher is restarted.
> +$node->safe_psql('postgres', "SELECT
> pg_drop_replication_slot('regression_slot4')");
> +
> +$node->stop;
> +$node->start;
> +
> +# Verify statistics data present in pg_stat_replication_slots are sane after
> +# publisher is restarted
> +$result = $node->safe_psql('postgres',
> + "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
> + FROM pg_stat_replication_slots ORDER BY slot_name"
>
> Various comments in the 0002 refer to publisher/subscriber which is
> not what we are using here.
>
> 5.
> +# Create table.
> +$node->safe_psql('postgres',
> + "CREATE TABLE test_repl_stat(col1 int)");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can save the above calls to pg_logical_slot_get_changes if
> we create table before creating the slots in this test.
>
> 0003
> ---------
> 6. In the tests/code, publisher is used at multiple places. I think
> that is not required because this can happen via plugin as well.
> 7.
> + if (max_replication_slots == nReplSlotStats)
> + {
> + ereport(pgStatRunningInCollector ? LOG : WARNING,
> + (errmsg("skipping \"%s\" replication slot statistics as
> pg_stat_replication_slots does not have enough slots",
> + NameStr(replSlotStats[nReplSlotStats].slotname))));
> + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
>
> Do we need memset here? Isn't this location is past the max location?

Thanks for the comments, I will fix and post a patch for this soon.

Regards,
Vignesh

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kazutaka Onishi 2021-04-06 12:06:33 Re: TRUNCATE on foreign table
Previous Message osumi.takamichi@fujitsu.com 2021-04-06 11:48:00 RE: Stronger safeguard for archive recovery not to miss data