From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
Cc: | shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add memory_limit_hits to pg_stat_replication_slots |
Date: | 2025-09-24 17:11:20 |
Message-ID: | CAD21AoBPKarxZNnSy884LzHtwO6QN0XUqRzSaM-sM79oPT=-UA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 23, 2025 at 11:31 PM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi,
>
> On Tue, Sep 23, 2025 at 11:39:22AM -0700, Masahiko Sawada wrote:
> > On Tue, Sep 23, 2025 at 1:52 AM Bertrand Drouvot
> > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > >
> > > However, technically speaking, "exceeded" is not the perfect wording since
> > > the code was doing (and is still doing something similar to):
> > >
> > > if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED &&
> > > - rb->size < logical_decoding_work_mem * (Size) 1024)
> > > + !memory_limit_reached)
> > > return;
> > >
> > > as the comment describes correctly using "reached":
> > >
> > > /*
> > > * Check whether the logical_decoding_work_mem limit was reached, and if yes
> > > * pick the largest (sub)transaction at-a-time to evict and spill its changes to
> > > * disk or send to the output plugin until we reach under the memory limit.
> > >
> > > So I think that "reached" or "hit" would be better wording. However, the
> > > documentation for spill_txns, stream_txns already use "exceeded" (and not "reached")
> > > so I went with "exceeded" for consistency. I think that's fine, if not we may want
> > > to use "reached" for those 3 stats descriptions.
> >
> > I find "exceeded" is fine as the documentation for logical decoding
> > also uses it[1]:
> >
> > Similar to spill-to-disk behavior, streaming is triggered when the
> > total amount of changes decoded from the WAL (for all in-progress
> > transactions) exceeds the limit defined by logical_decoding_work_mem
> > setting.
>
> Yes it also uses "exceeds" but I think it's not 100% accurate. It would be
> if, in ReorderBufferCheckMemoryLimit, we were using "<=" instead of "<" in:
>
> if (debug_logical_replication_streaming == DEBUG_LOGICAL_REP_STREAMING_BUFFERED &&
> rb->size < logical_decoding_work_mem * (Size) 1024)
>
> I think an accurate wording would be "reaches or exceeds" in all those places,
> but just using "exceeds" looks good enough.
>
> > One comment for the v3 patch:
> >
> > + <para>
> > + Number of times <literal>logical_decoding_work_mem</literal> has been
> > + exceeded while decoding changes from WAL for this slot.
> > + </para>
> >
> > How about rewording it to like:
> >
> > Number of times the memory used by logical decoding has exceeded
> > logical_decoding_work_mem.
>
> That sounds better, thanks! Used this wording in v4 attached (that's the only
> change as compared to v3).
>
Thank you for updating the patch! Here are some comments:
---
+ bool memory_limit_reached = (rb->size >=
logical_decoding_work_mem * (Size) 1024);
+
+ if (memory_limit_reached)
+ rb->memExceededCount += 1;
Do we want to use 'exceeded' for the variable too for better consistency?
---
One thing I want to clarify is that even if the memory usage exceeds
the logical_decoding_work_mem it doesn't necessarily mean we serialize
or stream transactions because of
ReorderBufferCheckAndTruncateAbortedTXN(). For example, in a situation
where many large already-aborted transactions are truncated by
transactionsReorderBufferCheckAndTruncateAbortedTXN(), users would see
a high number in mem_exceeded_count column but it might not actually
require any adjustment for logical_decoding_work_mem. One idea is to
increment that counter if exceeding memory usage is caused to
serialize or stream any transactions. On the other hand, it might make
sense and be straightforward too to show a pure statistic that the
memory usage exceeded the logical_decoding_work_mem. What do you
think?
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2025-09-24 17:19:27 | Re: Suggestion to add --continue-client-on-abort option to pgbench |
Previous Message | Melanie Plageman | 2025-09-24 17:07:46 | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |