Re: Add memory_limit_hits to pg_stat_replication_slots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Add memory_limit_hits to pg_stat_replication_slots
Date: 2025-09-11 22:24:54
Message-ID: CAD21AoDYPbexAQB+_e34KyQycCCTkhBgRS8Y+To_=kD-z_SG1w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 27, 2025 at 12:26 AM Bertrand Drouvot
<bertranddrouvot(dot)pg(at)gmail(dot)com> wrote:
>
> Hi hackers,
>
> I think that it's currently not always possible to determine how many times
> logical_decoding_work_mem has been reached.
>
> For example, say a transaction is made of 40 subtransactions, and I get:
>
> slot_name | spill_txns | spill_count | total_txns
> --------------+------------+-------------+------------
> logical_slot | 40 | 41 | 1
> (1 row)
>
> Then I know that logical_decoding_work_mem has been reached one time (total_txns).
>
> But as soon as another transaction is decoded (that does not involve spilling):
>
> slot_name | spill_txns | spill_count | total_txns
> --------------+------------+-------------+------------
> logical_slot | 40 | 41 | 2
> (1 row)
>
> Then we don't know if logical_decoding_work_mem has been reached one or two
> times.
>
> Please find attached a patch to $SUBJECT, to report the number of times the
> logical_decoding_work_mem has been reached.
>
> With such a counter one could get a ratio like total_txns/memory_limit_hits.
>
> That could help to see if reaching logical_decoding_work_mem is rare or
> frequent enough. If frequent, then maybe there is a need to adjust
> logical_decoding_work_mem.
>
> Based on my simple example above, one could say that it might be possible to get
> the same with:
>
> (spill_count - spill_txns) + (stream_count - stream_txns)
>
> but that doesn't appear to be the case with a more complicated example (277 vs 247):
>
> slot_name | spill_txns | spill_count | total_txns | stream_txns | stream_count | memory_limit_hits | (spc-spct)+(strc-strt)
> --------------+------------+-------------+------------+-------------+--------------+-------------------+------------------------
> logical_slot | 405 | 552 | 19 | 5 | 105 | 277 | 247
> (1 row)
>
> Not sure I like memory_limit_hits that much, maybe work_mem_exceeded is better?
>
> Looking forward to your feedback,

Yes, it's a quite different situation in two cases: spilling 100
transactions in one ReorderBufferCheckMemoryLimit() call and spilling
1 transaction in each 100 ReorderBufferCheckMemoryLimit() calls, even
though spill_txn is 100 in both cases. And we don't have any
statistics to distinguish between these cases. I agree with the
statistics.

One minor comment is:

@@ -1977,6 +1978,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
repSlotStat.stream_bytes = rb->streamBytes;
repSlotStat.total_txns = rb->totalTxns;
repSlotStat.total_bytes = rb->totalBytes;
+ repSlotStat.memory_limit_hits = rb->memory_limit_hits;

Since other statistics counter names are camel cases I think it's
better to follow that for the new counter.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2025-09-11 22:42:16 Re: race condition in pg_class
Previous Message Andrei Lepikhov 2025-09-11 21:54:05 Re: Expose custom planning data in EXPLAIN