From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Report bytes and transactions actually sent downtream |
Date: | 2025-07-13 11:04:14 |
Message-ID: | CAA4eK1JyXi9Ogt9=DuTnnww8tqSTn3RFWmu6cX+qcGR1jYXOYw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 1, 2025 at 7:35 PM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, Jul 1, 2025 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Jun 30, 2025 at 3:24 PM Ashutosh Bapat
> > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > >
> > > Hi All,
> > > In a recent logical replication issue, there were multiple replication
> > > slots involved, each using a different publication. Thus the amount of
> > > data that was replicated through each slot was expected to be
> > > different. However, total_bytes and total_txns were reported the same
> > > for all the replication slots as expected. One of the slots started
> > > lagging and we were trying to figure out whether its the WAL sender
> > > slowing down or the consumer (in this case Debezium). The lagging
> > > slot then showed total_txns and total_bytes lesser than other slots
> > > giving an impression that the WAL sender is processing the data
> > > slowly. Had pg_stat_replication_slot reported the amount of data
> > > actually sent downstream, it would have been easier to compare it with
> > > the amount of data received by the consumer and thus pinpoint the
> > > bottleneck.
> > >
> > > Here's a patch to do the same. It adds two columns
> > > - sent_txns: The total number of transactions sent downstream.
> > > - sent_bytes: The total number of bytes sent downstream in data messages
> > > to pg_stat_replication_slots. sent_bytes includes only the bytes sent
> > > as part of 'd' messages and does not include keep alive messages or
> > > CopyDone messages for example. But those are very few and can be
> > > ignored. If others feel that those are important to be included, we
> > > can make that change.
> > >
> > > Plugins may choose not to send an empty transaction downstream. It's
> > > better to increment sent_txns counter in the plugin code when it
> > > actually sends a BEGIN message, for example in pgoutput_send_begin()
> > > and pg_output_begin(). This means that every plugin will need to be
> > > modified to increment the counter for it to reported correctly.
> > >
> >
> > What if some plugin didn't implemented it or does it incorrectly?
> > Users will then complain that PG view is showing incorrect value.
>
> That is right.
>
> To fix the problem of plugins not implementing the counter increment
> logic we could use logic similar to how we track whether
> OutputPluginPrepareWrite() has been called or not. In
> ReorderBufferTxn, we add a new member sent_status which would be an
> enum with 3 values UNKNOWN, SENT, NOT_SENT. Initially the sent_status
> = UNKNOWN. We provide a function called
> plugin_sent_txn(ReorderBufferTxn txn, sent bool) which will set
> sent_status = SENT when sent = true and sent_status = NOT_SENT when
> sent = false. In all the end transaction callback wrappers like
> commit_cb_wrapper(), prepare_cb_wrapper(), stream_abort_cb_wrapper(),
> stream_commit_cb_wrapper() and stream_prepare_cb_wrapper(), if
> tsent_status = UNKNOWN, we throw an error.
>
I think we don't want to make it mandatory for plugins to implement
these stats, so instead of throwing ERROR, the view should show that
the plugin doesn't provide stats. How about having OutputPluginStats
similar to OutputPluginCallbacks and OutputPluginOptions members in
LogicalDecodingContext? It will have members like stats_available,
txns_sent or txns_skipped, txns_filtered, etc. I am thinking it will
be better to provide this information in a separate view like
pg_stat_plugin_stats or something like that, here we can report
slot_name, plugin_name, then the other stats we want to implement part
of OutputPluginStats.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2025-07-13 11:30:01 | Re: failover logical replication slots |
Previous Message | Amit Kapila | 2025-07-13 10:28:01 | Re: Logical replication prefetch |