Re: Replication slot stats misgivings

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: Replication slot stats misgivings
Date: 2021-04-07 09:20:56
Message-ID: CALDaNm195xL1bZq4VHKt=-wmXJ5kC4jxKh7LXK+pN7ESFjHO+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 6, 2021 at 12:19 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Apr 5, 2021 at 8:51 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
>
> Few comments on the latest patches:
> Comments on 0001
> --------------------------------
> 1.
> @@ -659,6 +661,8 @@ ReorderBufferTXNByXid(ReorderBuffer *rb,
> TransactionId xid, bool create,
> dlist_push_tail(&rb->toplevel_by_lsn, &txn->node);
> AssertTXNLsnOrder(rb);
> }
> +
> + rb->totalTxns++;
> }
> else
> txn = NULL; /* not found and not asked to create */
> @@ -3078,6 +3082,7 @@ ReorderBufferChangeMemoryUpdate(ReorderBuffer *rb,
> {
> txn->size += sz;
> rb->size += sz;
> + rb->totalBytes += sz;
>
> I think this will include the txns that are aborted and for which we
> don't send anything. It might be better to update these stats in
> ReorderBufferProcessTXN or ReorderBufferReplay where we are sure we
> have sent the data. We can probably use size/total_size in txn. We
> need to be careful to not double include the totaltxn or totalBytes
> for streaming xacts as we might process the same txn multiple times.

Modified it to update total_byte for spilled transactions and streamed
transactions where spill_bytes and stream_bytes are updated. For
non-stream/spilled transactions, total_bytes is updated in
ReorderBufferProcessTXN.

> 2.
> + Amount of decoded transactions data sent to the decoding output plugin
> + while decoding the changes from WAL for this slot. This and total_txns
> + for this slot can be used to gauge the total amount of data during
> + logical decoding.
>
> I think we can slightly modify the second line here: "This can be used
> to gauge the total amount of data sent during logical decoding.". Why
> we need to include total_txns along with it.

Modified it.

> 0002
> ----------
> 3.
> + -- we don't want to wait forever; loop will exit after 30 seconds
> + FOR i IN 1 .. 5 LOOP
> +
> ...
> ...
> +
> + -- wait a little
> + perform pg_sleep_for('100 milliseconds');
>
> I think this loop needs to be executed 300 times instead of 5 times,
> if the above comments and code needs to do what is expected here?
>

Modified it.

> 4.
> +# Test to drop one of the subscribers and verify replication statistics data is
> +# fine after publisher is restarted.
> +$node->safe_psql('postgres', "SELECT
> pg_drop_replication_slot('regression_slot4')");
> +
> +$node->stop;
> +$node->start;
> +
> +# Verify statistics data present in pg_stat_replication_slots are sane after
> +# publisher is restarted
> +$result = $node->safe_psql('postgres',
> + "SELECT slot_name, total_txns > 0 AS total_txn, total_bytes > 0 AS total_bytes
> + FROM pg_stat_replication_slots ORDER BY slot_name"
>
> Various comments in the 0002 refer to publisher/subscriber which is
> not what we are using here.

Removed references to publisher/subscriber.

> 5.
> +# Create table.
> +$node->safe_psql('postgres',
> + "CREATE TABLE test_repl_stat(col1 int)");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot1', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot2', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot3', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
> +$node->safe_psql('postgres',
> + "SELECT data FROM
> pg_logical_slot_get_changes('regression_slot4', NULL, NULL,
> 'include-xids', '0', 'skip-empty-xacts', '1')");
>
> I think we can save the above calls to pg_logical_slot_get_changes if
> we create table before creating the slots in this test.
>

Modified it.

> 0003
> ---------
> 6. In the tests/code, publisher is used at multiple places. I think
> that is not required because this can happen via plugin as well.

Removed references to publisher.

> 7.
> + if (max_replication_slots == nReplSlotStats)
> + {
> + ereport(pgStatRunningInCollector ? LOG : WARNING,
> + (errmsg("skipping \"%s\" replication slot statistics as
> pg_stat_replication_slots does not have enough slots",
> + NameStr(replSlotStats[nReplSlotStats].slotname))));
> + memset(&replSlotStats[nReplSlotStats], 0, sizeof(PgStat_ReplSlotStats));
>
> Do we need memset here? Isn't this location is past the max location?

That is not required, I have modified it.
Attached v4 patch has the fixes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v4-0001-Added-total-txns-and-total-txn-bytes-to-replicati.patch text/x-patch 27.6 KB
v4-0002-Added-tests-for-verification-of-logical-replicati.patch text/x-patch 5.7 KB
v4-0003-Handle-overwriting-of-replication-slot-statistic-.patch text/x-patch 4.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-04-07 09:25:37 Re: Implementing Incremental View Maintenance
Previous Message Drouvot, Bertrand 2021-04-07 09:06:11 Re: Minimal logical decoding on standbys