Re: Report reorder buffer size

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Report reorder buffer size
Date: 2025-08-25 20:47:05
Message-ID: CAD21AoDVhtDugdR2pR9yyuo8XKn0mo0gR5arqOZBKK_8sOe8sg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Aug 13, 2025 at 5:40 AM Ashutosh Bapat
<ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
>
> Hi All,
> The question of how to tune logical_decoding_work_mem has come up a
> few times in the context of customer cases. If it is too low then the
> changes are spilled to disk, slowing down replication and consuming
> disk space. In case of streamed transactions, it won't consume space
> on primary but will consume resources (transactional resources/disk
> space) on replica. If decoding memory is too high, it may cause memory
> pressure especially when there are many wal senders and large amounts
> of changes queued to reorder buffers frequently.

Thank you for starting the thread about this topic. This is one of the
topics that I've been eager to address. Here are some random comments:

>
> In order to tune logical_decoding_work_mem optimally we need a measure
> of the size of changes belonging to the transactions queued in the
> reorder buffer, whether spilled to the disk or in memory or sent
> downstream as part of an unfinished streamed transaction. Currently
> there is no way to determine that. pg_stat_replication_slots reports
> cumulative statistics about spilled or streamed transactions. It helps
> to know whether the logical decoding work memory is sufficient or not
> to hold all the changes. But it does not provide a way to estimate an
> optimal value for it since it doesn't capture the state of the reorder
> buffer at a given point in time.
>
> With the attached WIP patch, we can report, at a given point in time,
> total size of reorder buffer i.e. sum of the size of all the changes
> belonging to the transactions active in the reorder buffer whether
> those changes are in memory, on disk or sent downstream as part of an
> unfinished streamed transaction. By sampling this value at regular
> intervals, one can observe the reorder buffer trendline as shown in
> the plots attached.

Alternative idea (or an additional metric) would be the high-watermark
of memory usage. That way, users won't miss memory usage spikes that
might be missed by sampling the total memory usage.

> Tracking total size of reorder buffer also helps to quantify load on a
> wal sender. It is an additional metric that can be useful to debug WAL
> buildup, slow replication etc. For example, the plot clearly and
> directly shows when the wal sender started processing the large
> transaction, when it finished etc.

Agreed.

> The patch is not complete. There are many things to be worked on
> 1. Whether the metrics should be reported in pg_stat_replication or
> pg_stat_replication_slots. Given that it's a point-in-time metric, it
> fits pg_stat_replication better. But that view does not report logical
> decoding activity happening in backends other than the wal senders.
> But that's true with the other metrics in that view as well.
> pg_stat_replication_slots, however, covers all users of logical
> decoding. For now I have used pg_stat_replication, but it can be
> changed based on the discussion here.
>
> 2. Is the size of the reorder buffer enough or we want to also track
> the size of changes on disk and the size of changes sent downstream as
> part of unfinished streamed transactions separately? Also the number
> of transactions being tracked by the reorder buffer?

For the purpose of tuning logical_decoding_work_mem, the additional
metrics and statistics we need might not be many. But in order to make
logical decoding more visible for users for debugging or diagnosing
purposes, more statistics like the number of transactions being
tracked by the reorder buffer might be required.

We need to note that the actual size of changes sent downstream
actually depends on logical decoding plugins. For instance, we have
table, row, and column filters in logical replication cases. It might
be worth considering providing such statistics too. The statistics
like the number of changes filtered out by the table filters or the
change-kind filter (e.g., only publishing INSERTs etc) might be
helpful for users to confirm the effectiveness of the filters they
set.

> If we are going
> to report so much statistics about the contents of the reorder buffer,
> is it better to have a separate view pg_stat_reorder_buffer for the
> same?

Given logical decoding can be used also by regular backend processes,
I guess that such dynamic metrics would fit a system view dedicated to
logical decoding, say pg_stat_reorder_buffer or
pg_stat_logical_decoding.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-08-25 20:58:24 Re: [BUG] Remove self joins causes 'variable not found in subplan target lists' error
Previous Message Nathan Bossart 2025-08-25 20:33:08 Re: Improve LWLock tranche name visibility across backends