Quick Links

Re: memory leak in logical WAL sender with pgoutput's cachectx

From:	赵宇鹏(宇彭) <zhaoyupeng(dot)zyp(at)alibaba-inc(dot)com>
To:	"Masahiko Sawada" <sawada(dot)mshk(at)gmail(dot)com>
Cc:	"pgsql-hackers" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato" <kuroda(dot)hayato(at)fujitsu(dot)com>
Subject:	Re: memory leak in logical WAL sender with pgoutput's cachectx
Date:	2025-08-21 08:32:48
Message-ID:	2485989e-7fbd-412a-a35d-4b7fd1d88d8c.zhaoyupeng.zyp@alibaba-inc.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

From what we see in our users’ production environments, the situation is exactly
as previously described. Creating a “publication for all tables” is very common,
because manually choosing individual tables to publish can be cumbersome.
Regular CREATE/DROP TABLE activity is also normal, and the tables are not
necessarily short-lived. Since walsender is intended to be a long-lived process,
its memory footprint keeps accumulating over time.

Even if we ignore DROP TABLE entirely and only consider a large number of tables
that must be published, RelationSyncEntry alone can consume substantial memory.
Many users run multiple walsenders on the same instance, which further increases
memory pressure.

In normal backend processes, many cache structures are never evicted. That
already causes issues, but it is at least somewhat tolerable because a backend
is considered short-lived and a periodic reconnect can release the memory.
A walsender, however, is expected to stay alive much longer, nobody wants
replication sessions to be dropped regularly, so I am genuinely curious why
structures like RelationSyncEntry were not given an LRU-style eviction mechanism
from the start.

Adding an LRU mechanism to RelationSyncEntry has another benefit: it puts an
upper bound on the workload of callbacks such as invalidation_cb, preventing
walsender from stalling when there are a large number of tables. I have
therefore implemented a prototype of this idea (borrowing some code from
Hayato Kuroda). It should keep memory usage under control in more scenarios
while introducing only minimal overhead in theory. I will run additional
performance tests to confirm this.

What do you think of this approach?

Best regards,

Attachment	Content-Type	Size
v1-0001-add-lru-for-rel_sync_cache.patch	application/octet-stream	7.6 KB

In response to

Re: memory leak in logical WAL sender with pgoutput's cachectx at 2025-08-15 18:29:24 from Masahiko Sawada

Responses

Re: memory leak in logical WAL sender with pgoutput's cachectx at 2025-08-21 09:55:29 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Zhijie Hou (Fujitsu)	2025-08-21 08:39:11	RE: Conflict detection for update_deleted in logical replication
Previous Message	Zhijie Hou (Fujitsu)	2025-08-21 08:31:22	RE: Conflict detection for update_deleted in logical replication