| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
|---|---|
| To: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
| Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, ashu(dot)coek88(at)gmail(dot)com, michael(at)paquier(dot)xyz, bertranddrouvot(dot)pg(at)gmail(dot)com, andres(at)anarazel(dot)de, shveta(dot)malik(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Report bytes and transactions actually sent downtream |
| Date: | 2026-06-28 06:56:06 |
| Message-ID: | CAA4eK1KVQfQtdYKocfsxB1njEQGx4fhMX4N0bjmozG7Sx_V8Sw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Thu, Jun 25, 2026 at 4:37 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 16, 2026 at 2:06 AM Ashutosh Bapat
> <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> >
> >
> > Those are logical message types which are part of the logical change
> > data - without those messages it's not possible to process the logical
> > change data. So they are included. But the keepalive messages, for
> > example, aren't part of the logical change data.
>
> I think logical replication messages like STREAM START/STREAM STOP and
> BEGIN/END, versus messages like keepalive and standby/primary status
> updates, operate at different layers. The former are the contents of
> the logical output and seem to belong naturally to the replication
> slot's statistics. I'm not sure the latter should be included as well
> -- I'm concerned that counting those bytes would become noise when
> analyzing the statistics over time, since they have no relation to the
> volume of logical changes.
>
Both you and Ashutosh seem to make the similar points and sound
reasonable, so let's do it that way.
> If we do want a statistic showing the literal total bytes sent
> downstream including protocol messages, ISTM that should be available
> for both logical and physical replication: physical replication also
> uses keepalive messages and adds a header to each message. In other
> words, that kind of "bytes on the wire" metric isn't really specific
> to a logical replication slot, so the slot's statistics don't seem
> like the right place for it.
>
> The proposed column name 'sent_bytes' is also confusing to me, because
> I don't think we can call it "total bytes actually sent" in the
> logical decoding SQL API case. A name like 'plugin_total_bytes' seems
> more straightforward and conveys the intent that protocol messages are
> not included.
>
The SQL API point is genuine and if we display sent_bytes via SQL API
then pg_logical_slot_get_changes() will show nonzero sent_bytes even
though nothing was ever sent anywhere. OTOH, adding plugin_* prefix
also starts to make it sound like stats are plugin specific, how about
calling it as 'output_bytes'? It pairs cleanly with the existing
column: total_bytes = decoded into the reorder buffer, output_bytes =
decoded and handed to the consumer.
If we use output_bytes, then we can describe the new stats on the
lines of following text in the docs:
<para>
Amount of decoded data produced for this slot's consumer by the output
plugin, after applying any output plugin filters and converting the
changes into the output plugin's format. This counts the transaction
changes together with the messages that delimit them (such as the
begin and commit messages), but not connection-management messages
such as keepalives, which are generated by the server rather than the
output plugin and are therefore not included.
</para>
<para>
This value can differ from <structfield>total_bytes</structfield>: it
may be smaller because filtered changes are not output, or larger
because the output plugin's format can be more verbose than the
decoded changes. For these reasons
<structfield>output_bytes</structfield> is not directly comparable to
<structfield>total_bytes</structfield>.
</para>
--
With Regards,
Amit Kapila.
| From | Date | Subject | |
|---|---|---|---|
| Previous Message | Chengpeng Yan | 2026-06-28 05:13:47 | Re: Improve row estimation with multi-column unique indexes |