Re: memory leak in logical WAL sender with pgoutput's cachectx

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: 赵宇鹏(宇彭) <zhaoyupeng(dot)zyp(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: memory leak in logical WAL sender with pgoutput's cachectx
Date: 2025-08-14 12:06:52
Message-ID: CABPTF7V_s-DmX78+Q-cM_9Fj+4_4ozNhZstecZLUGMOZd7kvqA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for the patch.

On Thu, Aug 14, 2025 at 6:39 PM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Zhao,
>
> Thanks for raising the issue.
>
> > If you run the script and then call MemoryContextStats(TopMemoryContext). you
> > will see something like:
> > "logical replication cache context: 562044928 total in 77 blocks;"
> > meaning “cachectx” has grown to ~500 MB, and it keeps growing as the number
>
> I also ran your script with count=1000, and confirmed that cachectx was grown
> to around 50MB:
>
> ```
> logical replication cache context: 58728448 total in 17 blocks; 3239568 free (62 chunks); 55488880 used
> ```
>
> > cachectx is used mainly for entry->old_slot, entry->new_slot and entry->attrmap
> > allocations. When a DROP TABLE causes an invalidation we only set
> > entry->replicate_valid = false; we do not free those allocations immediately.
> > They are freed only if the same entry is used again. In some workloads an entry
> > may never be reused, or it may be reused briefly and then become unreachable
> > forever (The WAL sender may still need to decode WAL records for tables that
> > have already been dropped while it is processing the invalidation.)
>
> So, your suggestion is that we should sometimes free allocated memory for them,
> right? Valid point.
>
> > Given the current design I don’t see a simple fix. Perhaps RelationSyncCache
> > needs some kind of eviction/cleanup policy to prevent this memory growth in
> > such scenarios.
> >
> > Does anyone have ideas or suggestions?
>
> Naively considered, relsync cahe can be cleaned up if entries were invalidated
> many times. Attached patch implemented idea. It could reduce the used memory on
> my env:
>
> ```
> logical replication cache context: 1056768 total in 8 blocks; 556856 free (51 chunks); 499912 used
> ```
>
> Can you verify that?
>

I had a quick look at it, and have some questions.
Is it safe to free the substructure from within rel_sync_cache_relation_cb()?
I’ also interested in the reasoning behind setting
NINVALIDATION_THRESHOLD to 100.

Best,
Xuneng

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-08-14 12:12:44 RE: Make pgoutput documentation easier to find
Previous Message Hayato Kuroda (Fujitsu) 2025-08-14 11:48:35 Compilation issues for HASH_STATISTICS and HASH_DEBUG options