From: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Report reorder buffer size |
Date: | 2025-08-21 13:56:41 |
Message-ID: | CAExHW5stjbY_jqcP=PAj6w93dX_zTMtjrtm2S7bZvcjQkL81dw@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Bertrand,
Thanks for your response. I am glad that you have found the proposal
to be generally useful.
On Thu, Aug 14, 2025 at 3:50 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> > Using these trendlines, a user may decide to keep
> > logical_decoding_work_mem to minimum and thus let all the large
> > transactions spill to disk or be streamed.
>
> Right. Also, the issue with spill files is that when they are read back from disk
> then the reorder buffer can consume a lot of memory. That's why I think tracking
> the xid, sub xid and LSNs could be useful too. That could help diagnose why
> it was using that memory.
When the replication is stuck or is slow or WAL builds up a detailed
look at the reorder buffer's contents would be useful. It will need to
be done outside of pg_stat_replication or pg_stat_replication_slots. I
think it will be useful to provide a function which will trigger a
given WAL sender to dump the information about the contents of the
reorder buffer to the server error log or send them to the requesting
client.
>
> I think pg_stat_replication is a good place to record "real time" activities.
>
Ok. Thanks for confirming.
> Maybe we could log 2 metrics: the reorder buffer size before creating the spill
> files and the reorder buffer size while reading back the spill files? That would
> help make the distinction when logical_decoding_work_mem is involved (i.e before
> creating the spill files) and when it is not (reading them back).
>
> pg_stat_replication_slots could be used to add some counters too: like the
> number of times logical_decoding_work_mem has been reached while decoding from
> that slot.
I think pg_stat_replication_slots::spill_count and
pg_stat_replication_slots::stream_count give that value.
> Also maybe record the max value the reorder buffer reached and
> the associated xid, number of sub xids, and LSN? That could be useful to diagnose
> high memory consumption and/or OOM issues due to logical decoding.
A function which dumps the information about reorder buffers can be
used along with the trendline for this purpose. For example, if a user
notices WAL accumulation and an upward trend in the metric exposed by
my patch, they can trigger a dump of reorder buffer contents at that
time and diagnose the problem and even fix it proactively.
>
> Also, should we log some of information to the server log?
>
Information in server error logs is hard to access and analyse,
especially in cloud environment. Having a SQL construct is better.
> > But before I work on those, I would like to know others' opinions,
> > usefulness and acceptance.
>
> Thanks for working on it. I like the idea and also think that is an area where
> more details/metrics should be provided.
Thanks.
--
Best Wishes,
Ashutosh Bapat
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-08-21 13:56:48 | Re: Don't treat virtual generated columns as missing statistics in vacuumdb --missing-stats-only |
Previous Message | Nathan Bossart | 2025-08-21 13:44:01 | Re: vacuumdb --missing-stats-only and permission issue |