From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> |
Cc: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Add memory_limit_hits to pg_stat_replication_slots |
Date: | 2025-10-06 06:52:24 |
Message-ID: | aONnKBgNIUJdINSP@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Mon, Oct 06, 2025 at 10:50:52AM +0530, Ashutosh Bapat wrote:
> On Fri, Oct 3, 2025 at 11:48 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Fri, Oct 3, 2025 at 9:26 AM Ashutosh Bapat
> > <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com> wrote:
> > >
> > > On Fri, Oct 3, 2025 at 6:45 PM Bertrand Drouvot
> > > <bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > On Fri, Oct 03, 2025 at 05:19:42PM +0530, Ashutosh Bapat wrote:
> > > > > + bool memory_limit_reached = (rb->size >= logical_decoding_work_mem *
> > > > > (Size) 1024);
> > > > > +
> > > > > + if (memory_limit_reached)
> > > > > + rb->memExceededCount += 1;
> > > >
> > > > Thanks for looking at it!
> > > >
> > > > > If the memory limit is hit but no transaction was serialized, the
> > > > > stats won't be updated since UpdateDecodingStats() won't be called. We
> > > > > need to call UpdateDecodingStats() in ReorderBufferCheckMemoryLimit()
> > > > > if no transaction was streamed or spilled.
> > > >
> > > > I did some testing and the stats are reported because UpdateDecodingStats() is
> > > > also called in DecodeCommit(), DecodeAbort() and DecodePrepare() (in addition
> > > > to ReorderBufferSerializeTXN() and ReorderBufferStreamTXN()). That's also why
> > > > ,for example, total_txns is reported even if no transaction was streamed or
> > > > spilled.
> > >
> > > In a very pathological case, where all transactions happen to be
> > > aborted while decoding and yet memory limit is hit many times, nothing
> > > will be reported till first committed transaction after it is decoded.
> > > Which may never happen. I didn't find a call stack where by
> > > UpdateDecodingStats could be reached from
> > > ReorderBufferCheckAndTruncateAbortedTXN().
> >
> > The more we report the status frequently, the less chances we lose the
> > statistics in case of logical decoding being interrupted but the more
> > overheads we have to update the statistics. I personally prefer not to
> > call UpdateDecodingStats() frequently since pgstat_report_replslot()
> > always flush the statistics. If the transaction is serialized or
> > streamed, we can update the memExceededCount together with other
> > statistics such as streamBytes and spillBytes. But if we can free
> > enough memory only by truncating already-aborted transactions, we need
> > to rely on the next committed/aborted/prepared transaction to update
> > the statistics.
Indeed, there is cases when committed/aborted/prepared would not be called
right after ReorderBufferCheckAndTruncateAbortedTXN().
> So how about calling UpdateDecodingStats() only in
> > case where we only truncate aborted transactions and the memory usage
> > gets lower than the limit?
>
> Yes, that's what my intention was.
I also think it makes sense.
> > I've attached the patch that implements this idea with a small
> > refactoring.
Thanks!
> It also has the change to the regression test results
> > we've discussed.
-SELECT slot_name, spill_txns, spill_count, mem_exceeded_count FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot_stats4_twophase';
+SELECT slot_name, spill_txns, spill_count, mem_exceeded_count > 0 as mem_exceeded_count FROM pg_stat_replication_slots WHERE slot_name = 'regression_slot_stats4_twophase';
slot_name | spill_txns | spill_count | mem_exceeded_count
---------------------------------+------------+-------------+--------------------
- regression_slot_stats4_twophase | 0 | 0 | 1
+ regression_slot_stats4_twophase | 0 | 0 | t
Could we also imagine that there are other activities enough to reach the memory
limit and transactions are not aborted, meaning spill_txns and/or spill_count are > 0?
In that case we may want to get rid of this test instead (as checking spill_txns >=0
and spill_count >=0 would not really reflect the intend of this test).
> The change looks good to me.
>
> Given Andres's comment, in a nearby thread, about being cautious about
> adding useless statistics, I think this one needs a bit more
> discussion. In the proposal email Bertant wrote
> > Please find attached a patch to $SUBJECT, to report the number of times the
> > logical_decoding_work_mem has been reached.
> >
> > With such a counter one could get a ratio like total_txns/memory_limit_hits.
> >
> > That could help to see if reaching logical_decoding_work_mem is rare or
> > frequent enough. If frequent, then maybe there is a need to adjust
> > logical_decoding_work_mem.
> >
>
> I agree with the goal that we need a metric to decide whether
> exceeding logical decoding work mem is frequent or not.
>
> > Based on my simple example above, one could say that it might be possible to get
> > the same with:
> >
> > (spill_count - spill_txns) + (stream_count - stream_txns)
> >
> > but that doesn't appear to be the case with a more complicated example (277 vs 247):
> >
> > slot_name | spill_txns | spill_count | total_txns | stream_txns | stream_count | memory_limit_hits | (spc-spct)+(strc-strt)
> > --------------+------------+-------------+------------+-------------+--------------+-------------------+------------------------
> > logical_slot | 405 | 552 | 19 | 5 | 105 | 277 | 247
> > (1 row)
>
> Is there another case where this difference is going to lead to a wrong conclusion?
Yeah, for example when the ratio aborted/committed is high, we could get things
like:
slot_name | spill_txns | spill_count | stream_txns | stream_count | total_txns | mem_exceeded_count | (spc-spct)+(strc-strt)
--------------+------------+-------------+-------------+--------------+------------+--------------------+------------------------
logical_slot | 1 | 2 | 0 | 0 | 192 | 244 | 1
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-10-06 07:28:12 | Re: Remove custom redundant full page write description from GIN |
Previous Message | Michael Paquier | 2025-10-06 06:37:18 | Re: Add stats_reset to pg_stat_all_tables|indexes and related views |