Re: Report bytes and transactions actually sent downtream

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Subject: Re: Report bytes and transactions actually sent downtream
Date: 2025-09-22 05:14:31
Message-ID: CAJpy0uCVmpn_Pu_=fYYosc8C2mL1abwC=LaUj88AK6=ZMMiycQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 19, 2025 at 8:11 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Fri, Sep 19, 2025 at 11:48 AM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> >
> > On Thu, Sep 18, 2025 at 3:54 PM Ashutosh Bapat
> > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > >
> > >
> > > >
> > > > Few comments:
> > > >
> > > > 1)
> > > > postgres=# select slot_name,
> > > > total_bytes,plugin_filtered_bytes,plugin_sent_bytes from
> > > > pg_stat_replication_slots order by slot_name;
> > > > slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes
> > > > -----------+-------------+-----------------------+-------------------
> > > > slot1 | 800636 | 793188 | 211
> > > > sub1 | 401496 | 132712 | 84041
> > > > sub2 | 401496 | 396184 | 674
> > > > sub3 | 401496 | 145912 | 79959
> > > > (4 rows)
> > > >
> > > > Currently it looks quite confusing. 'total_bytes' gives a sense that
> > > > it has to be a sum of filtered and sent. But they are no way like
> > > > that. In the thread earlier there was a proposal to change the name to
> > > > reordered_txns, reordered_bytes. That looks better to me. It will give
> > > > clarity without even someone digging into docs.
> > >
> > > I also agree with that. But that will break backward compatibility.
> >
> > Yes, that it will do.
> >
> > > Do
> > > you think other columns like spill_* and stream_* should also be
> > > renamed with the prefix "reordered"?
> > >
> >
> > Okay, I see that all fields in pg_stat_replication_slots are related
> > to the ReorderBuffer. On reconsideration, I’m unsure whether it's
> > appropriate to prefix all of them with reorderd_. For example,
> > renaming spill_bytes and stream_bytes to reordered_spill_bytes and
> > reordered_stream_bytes. These names start to feel overly long, and I
> > also noticed that ReorderBuffer isn’t clearly defined anywhere in the
> > documentation (or at least I couldn’t find it), even though the term
> > 'reorder buffer' does appear in a few places.
> >
> > As an example, see ReorderBufferRead, ReorderBufferWrite wait-types
> > at [1]. Also in plugin-doc [2], we use 'ReorderBufferTXN'. And now, we
> > are adding: ReorderBufferChangeSize, ReorderBufferChange
> >
> > This gives me a feeling, will it be better to let
> > pg_stat_replication_slots as is and add a brief ReorderBuffer section
> > under Logical Decoding concepts [3] just before Output Plugins. And
> > then, pg_stat_replication_slots can refer to that section, clarifying
> > that the bytes, counts, and txn fields pertain to ReorderBuffer
> > (without changing any of the fields).
> >
> > And then to define plugin related data, we can have a new view, say
> > pg_stat_plugin_stats (as Amit suggested earlier) or
> > pg_stat_replication_plugins. I understand that adding a new view might
> > not be desirable, but it provides better clarity without requiring
> > changes to the existing fields in pg_stat_replication_slots. I also
> > strongly feel that to properly tie all this information together, a
> > brief definition of the ReorderBuffer is needed. Other pages that
> > reference this term can then point to that section. Thoughts?
>
> Even if we keep two views, when they are joined, users will still get
> confused by total_* names. So it's not solving the underlying problem.

Okay, I see your point.

> Andres had raised the point about renaming total_* fields with me
> off-list earlier. He suggested names total_wal_bytes, and
> total_wal_txns in an off list discussion today. I think those convey
> the true meaning - that these are txns and bytes that come from WAL.

I agree.

> Used those in the attached patches. Prefix reordered would give away
> lower level details, so I didn't use it.
>
> I agree that it would be good to mention ReorderBuffer in the logical
> decoding concepts section since it mentions structures ReorderBuffer*.
> But that would be a separate patch since we aren't using "reordered"
> in the names of the fields.

Okay.

> 0001 is the previous patch
> 0002 changes addressing your and Bertrand's comments.
>

Few trivial comments:

1)
Currently the doc says:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. OutputPluginWrite will update this counter if
ctx->stats is initialized by the output plugin. filteredBytes is the
size of changes, in bytes, that are filtered out by the output plugin.
Function ReorderBufferChangeSize may be used to find the size of
filtered ReorderBufferChange.

Shall we rearrange it to:

sentTxns is the number of transactions sent downstream by the output
plugin. sentBytes is the amount of data, in bytes, sent downstream by
the output plugin. filteredBytes is the size of changes, in bytes,
that are filtered out by the output plugin. OutputPluginWrite will
update these counters if ctx->stats is initialized by the output
plugin.
The function ReorderBufferChangeSize can be used to compute the size
of a filtered ReorderBufferChange, i.e., the filteredBytes.

2)
My preference will be to rename the fields 'total_txns' and
'total_bytes' in PgStat_StatReplSlotEntry to 'total_wal_txns' and
'total_wal_bytes' for better clarity. Additionally, upon rethinking,
it seems better to me that plugin-related fields are also named as
plugin_* to clearly indicate their association. OTOH, in
OutputPluginStats, the field names are fine as is, since the structure
name itself clearly indicates these are plugin-related fields.
PgStat_StatReplSlotEntry lacks such context and thus using full
descriptive names there would improve clarity.

3)
LogicalOutputWrite:
+ if (ctx->stats)
+ ctx->stats->sentBytes += ctx->out->len + sizeof(XLogRecPtr) +
sizeof(TransactionId);
p->returned_rows++;

A blank line after the new change will increase readability.

~~

In my testing, the patch works as expected. Thanks!

thanks
Shveta

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2025-09-22 05:28:54 Re: How can end users know the cause of LR slot sync delays?
Previous Message Fujii Masao 2025-09-22 04:42:03 Re: Trivial fix for comment of function table_tuple_lock