Quick Links

memory leak in logical WAL sender with pgoutput's cachectx

From:	赵宇鹏(宇彭) <zhaoyupeng(dot)zyp(at)alibaba-inc(dot)com>
To:	"pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	memory leak in logical WAL sender with pgoutput's cachectx
Date:	2025-08-14 06:43:34
Message-ID:	f7af28a7-570f-40a1-9c1f-2c98559032e5.zhaoyupeng.zyp@alibaba-inc.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi all,

We recently ran into a memory leak in a production logical-replication WAL-sender
process. A simplified reproduction script is attached.

If you run the script and then call MemoryContextStats(TopMemoryContext). you
will see something like:
"logical replication cache context: 562044928 total in 77 blocks;"
meaning “cachectx” has grown to ~500 MB, and it keeps growing as the number of
tables increases.

The workload can be summarised as follows:
1. CREATE PUBLICATION FOR ALL TABLES
2. CREATE SUBSCRIPTION
3. Repeatedly CREATE TABLE and DROP TABLE

cachectx is used mainly for entry->old_slot, entry->new_slot and entry->attrmap
allocations. When a DROP TABLE causes an invalidation we only set
entry->replicate_valid = false; we do not free those allocations immediately.
They are freed only if the same entry is used again. In some workloads an entry
may never be reused, or it may be reused briefly and then become unreachable
forever (The WAL sender may still need to decode WAL records for tables that
have already been dropped while it is processing the invalidation.)

Given the current design I don’t see a simple fix. Perhaps RelationSyncCache
needs some kind of eviction/cleanup policy to prevent this memory growth in
such scenarios.

Does anyone have ideas or suggestions?

Attachment	Content-Type	Size
100_cachectx_oom.pl	application/octet-stream	3.6 KB

Responses

RE: memory leak in logical WAL sender with pgoutput's cachectx at 2025-08-14 10:39:36 from Hayato Kuroda (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	shveta malik	2025-08-14 06:43:59	Re: Improve pg_sync_replication_slots() to wait for primary to advance
Previous Message	Sutou Kouhei	2025-08-14 06:36:54	Re: Make COPY format extendable: Extract COPY TO format implementations