Re: Report bytes and transactions actually sent downtream

From: Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, michael(at)paquier(dot)xyz, amit(dot)kapila16(at)gmail(dot)com, bertranddrouvot(dot)pg(at)gmail(dot)com, andres(at)anarazel(dot)de, shveta(dot)malik(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Report bytes and transactions actually sent downtream
Date: 2026-06-15 08:22:36
Message-ID: CAE9k0Png8jDd-UHLy8c1Vi55RkH5p=fyo7eWu+Dp-0KRqXakFw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Jun 15, 2026 at 11:49 AM Kyotaro Horiguchi
<horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Mon, 15 Jun 2026 09:48:51 +0530, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote in
> > As I have explained in [1], total_bytes indicates the amount of data
> > added to the reorder buffer. It does not indicate the amount of data
> > in logical form sent downstream. The system which triggered this issue
> > used Debezium as the downstream. The customer wanted to configure
> > Debezium so that it can consume the logical changes in real time. But
> > they had no clue about the amount of logical changes they received
> > from upstream. total_bytes does not help since it's the amount of WAL
> > added to reorder buffer; not the amount of logical changes sent
> > downstream. Hence proposal to add new column sent_bytes.
> >
> > I hope this helps.
> >
> > [1] https://www.postgresql.org/message-id/flat/CAExHW5s6KntzUyUoMbKR5dgwRmd=
> > V2Ay_2%2BAnTgYGAzo%3DQv61wA%40mail.gmail.com
>
> Thanks, that clarifies the use case.
>
> If the goal is to estimate the volume of data that downstream
> consumers such as Debezium need to process, I'm still not sure why
> this necessarily needs to be the amount of logical change data rather
> than the amount of data actually sent over the replication connection.
> For monitoring or capacity-planning purposes, wire bytes seem like
> they would provide a very similar signal. The protocol overhead is
> relatively small, especially when the traffic volume is high, and the
> definition is somewhat more straightforward since it corresponds
> directly to the amount of data transmitted downstream.
>
> Also, I'm not sure that the logical-change size is necessarily a more
> accurate representation of the amount of change being processed.
> Since the amount of logical change data is itself influenced by how
> the output plugin represents changes, it still seems somewhat
> dependent on representation, just in a different way.
>
> Could you explain a bit more about why the logical-change size is the
> important metric here, rather than the number of bytes actually sent?
>

Sorry for chiming in - I may well be misunderstanding this, but here's
how I'm currently thinking about it:

Total transaction bytes refers to the size of decoded transactional
data accumulated in the reorder buffer for a given transaction.

Sent bytes (as I understand from the patch) refers to the size of the
downstream output that the output plugin produces from that decoded
data, after any filtering and format conversion.

To illustrate: if a transaction's decoded changes occupy 550 bytes in
the reorder buffer, but the output plugin filters some out and emits
only 300 bytes downstream, then total transaction bytes = 550 and sent
bytes = 300. Conversely, if all 550 bytes are converted into a more
verbose format and emitted as 700 bytes, total transaction bytes
remains 550 while sent bytes becomes 700.

If I'm reading this right, since total bytes for a transaction is the
baseline from which transaction-derived downstream output is produced,
I wonder whether sent bytes should include only that
transaction-derived downstream output, or also downstream protocol
traffic such as keepalive messages, which are sent downstream but are
not derived from transaction bytes in the reorder buffer.

My instinct is that if sent bytes are meant to measure
transaction-output throughput, keepalive messages probably shouldn't
be included, since they have no basis in transaction data and might
distort any comparison with total bytes. But I could be wrong - happy
to be corrected!

--
With Regards,
Ashutosh Sharma.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-06-15 08:50:26 Re: Proposal: Conflict log history table for Logical Replication
Previous Message Michael Paquier 2026-06-15 08:07:45 Dead reference to schema_only_with_statistics in pg_dump TAP code