From: | shveta malik <shveta(dot)malik(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Subject: | Re: Report bytes and transactions actually sent downtream |
Date: | 2025-09-18 05:21:50 |
Message-ID: | CAJpy0uAxw_BKOTAtbnPdBcBCMS6jqLiaCraqLjwhRvuytb008A@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote:
> > Here's the next patch which considers all the discussion so far. It
> > adds four fields to pg_stat_replication_slots.
> > - plugin - name of the output plugin
>
> Is this one needed? (we could get it with a join on pg_replication_slots)
>
In my opinion, when there are other plugin_* fields present, including
the plugin name directly here seems like a better approach. So, +1 for
the plugin field.
> > - plugin_filtered_bytes - reports the amount of changes filtered
> > out by the output plugin
> > - plugin_sent_txns - the amount of transactions sent downstream by
> > the output plugin
> > - plugin_sent_bytes - the amount of data sent downstream by the
> > outputplugin.
> >
> > There are some points up for a discussion:
> > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by
> > calling pgstat_reset() or pgstat_reset_of_kind() which don't know
> > about the contents of the entry. So
> > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin
> > stats are reported as NULL, instead of zero, immediately after reset.
> > This is the same case when the stats is queried immediately after the
> > statistics is initialized and before any stats are reported. We could
> > instead make it report
> > zero, if we save the plugin_has_stats and restore it after reset. But
> > doing that in pgstat_reset_of_kind() seems like an extra overhead + we
> > will need to write a function to find all replication slot entries.
I tried to think of an approach where we can differentiate between the
cases 'not initialized' and 'reset' ones with the values. Say instead
of plugin_has_stats, if we have plugin_stats_status, then we can
maintain status like -1(not initialized), 0(reset). But this too will
complicate the code further. Personally, I’m okay with NULL values
appearing even after a reset, especially since the documentation
explains this clearly.
>
> > 2. There's also a bit of asymmetry in the way sent_bytes is handled.
> > The code which actually sends the logical changes to the downstream is
> > part of the core code
> > but the format of the change and hence the number of bytes sent is
> > decided by the plugin. It's a stat related to plugin but maintained by
> > the core code. The patch implements it as a plugin stat (so the
> > corresponding column has "plugin" prefix
>
> The way it is done makes sense to me.
>
> > 3. The names of new columns have the prefix "plugin_" but the internal
> > variables tracking those don't for the sake of brevity. If you prefer
> > to have the same prefix for the internal variables, I can change that.
>
I am okay either way.
Few comments:
1)
postgres=# select slot_name,
total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
pg_stat_replication_slots order by slot_name;
slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
-----------+-------------+-----------------------+-------------------
slot1 | 800636 | 793188 | 211
sub1 | 401496 | 132712 | 84041
sub2 | 401496 | 396184 | 674
sub3 | 401496 | 145912 | 79959
(4 rows)
Currently it looks quite confusing. 'total_bytes' gives a sense that
it has to be a sum of filtered and sent. But they are no way like
that. In the thread earlier there was a proposal to change the name to
reordered_txns, reordered_bytes. That looks better to me. It will give
clarity without even someone digging into docs.
2)
Tried to verify all filtered data tests, seems to work well. Also I
tried tracking the usage of OutputPluginWrite() to see if there is any
other place where data needs to be considered as filtered-data.
Encountered this:
send_relation_and_attrs has:
if (!logicalrep_should_publish_column(att, columns,
include_gencols_type))
continue;
if (att->atttypid < FirstGenbkiObjectId)
continue;
But I don't think it needs to be considered as filtered data. This is
mostly schema related info. But I wanted to confirm once. Thoughts?
3)
+-- total_txns may vary based on the background activity but sent_txns should
+-- always be 1 since the background transactions are always skipped. Filtered
+-- bytes would be set only when there's a change that was passed to the plugin
+-- but was filtered out. Depending upon the background transactions, filtered
+-- bytes may or may not be zero.
+SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS
spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS
total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes,
plugin_filtered_bytes >= 0 AS filtered_bytes FROM
pg_stat_replication_slots ORDER BY slot_name;
In comment either we can say plugin_sent_txns instead of sent_txns or
in the query we can fetch plugin_sent_txns AS sent_txns, so that we
can relate comment and query.
4)
+ <literal>sentTxns</literal> is the number of transactions sent downstream
+ by the output plugin. <literal>sentBytes</literal> is the amount of data
+ sent downstream by the output plugin.
+ <function>OutputPluginWrite</function> is expected to update this counter
+ if <literal>ctx->stats</literal> is initialized by the output plugin.
+ <literal>filteredBytes</literal> is the size of changes in bytes that are
+ filtered out by the output plugin. Function
+ <literal>ReorderBufferChangeSize</literal> may be used to find
the size of
+ filtered <literal>ReorderBufferChange</literal>.
+ </para>
Either we can mention units as 'bytes' for both filteredBytes and
sentBytes or for none. Currently filteredBytes says 'in bytes' while
sentBytes does not.
thanks
Shveta
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-09-18 05:29:14 | Re: Reword messages using "as" instead of "because" |
Previous Message | Peter Smith | 2025-09-18 05:07:36 | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |