Quick Links

Re: memory leak in logical WAL sender with pgoutput's cachectx

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	赵宇鹏(宇彭) <zhaoyupeng(dot)zyp(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato" <kuroda(dot)hayato(at)fujitsu(dot)com>
Subject:	Re: memory leak in logical WAL sender with pgoutput's cachectx
Date:	2025-08-21 23:55:19
Message-ID:	CAD21AoC-ap5y-ut-EZe4gWh_w7vdARbrk=nDL0i8rR+OdoVDuQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Aug 21, 2025 at 2:55 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Aug 21, 2025 at 2:03 PM 赵宇鹏(宇彭) <zhaoyupeng(dot)zyp(at)alibaba-inc(dot)com> wrote:
> >
> > From what we see in our users’ production environments, the situation is exactly
> > as previously described. Creating a “publication for all tables” is very common,
> > because manually choosing individual tables to publish can be cumbersome.
> >
>
> I understand the difficulty in choosing individual tables, but OTOH
> all tables may lead to replicating tables that are not even required.
> This consumes CPU, network, and disk space without the real need. This
> sounds more harmful than once carefully describing publications. We
> have another way to combine tables, which is to use TABLES IN SCHEMA.
> I don't know if that can help, but anyway, we can't do much if the
> user wants to replicate all tables.
>
> > Regular CREATE/DROP TABLE activity is also normal, and the tables are not
> > necessarily short-lived. Since walsender is intended to be a long-lived process,
> > its memory footprint keeps accumulating over time.
> >
> > Even if we ignore DROP TABLE entirely and only consider a large number of tables
> > that must be published, RelationSyncEntry alone can consume substantial memory.
> > Many users run multiple walsenders on the same instance, which further increases
> > memory pressure.
> >
>
> Agreed, there is a possibility of consuming a large amount of memory,
> but still, we should do some experiments to see the real results. We
> haven't seen complaints of walsenders consuming large amounts of
> memory due to RelationSyncEntry's, so it is also possible that this is
> just a theoretical case especially after we have some handling for
> dropped tables.
>
> > In normal backend processes, many cache structures are never evicted. That
> > already causes issues, but it is at least somewhat tolerable because a backend
> > is considered short-lived and a periodic reconnect can release the memory.
> > A walsender, however, is expected to stay alive much longer, nobody wants
> > replication sessions to be dropped regularly, so I am genuinely curious why
> > structures like RelationSyncEntry were not given an LRU-style eviction mechanism
> > from the start.

FYI as far as I know we've discussed that a similar problem exists
also in normal backend processes especially for users who are using a
connection pool[1] (it was about syscache bloat due to negative
caches).

With your proposed patch, it maintains cache entries with a hard limit
of 1024 entries using LRU eviction. This could become a performance
bottleneck when replicating more than 1,024 tables. Even for databases
having less than 1024 tables it would lead to a negative performance
impact due to the cost of maintaining LRU order. Also, while having a
maximum capacity with an eviction mechanism makes sense, we should
make this limit configurable (probably by size?). If we have the limit
in size it might be worth noting that cache entries consume variable
amounts of memory, and their underlying memory blocks might persist
even after entry eviction.

> >
> > Adding an LRU mechanism to RelationSyncEntry has another benefit: it puts an
> > upper bound on the workload of callbacks such as invalidation_cb, preventing
> > walsender from stalling when there are a large number of tables. I have
> > therefore implemented a prototype of this idea (borrowing some code from
> > Hayato Kuroda). It should keep memory usage under control in more scenarios
> > while introducing only minimal overhead in theory.
>
> Yeah, such an idea is worth considering after we have some tests to
> show the high memory usage. BTW, I think it would be better to run
> some algorithm to evict entries when we reach the threshold limit as
> we do for shared buffers, see StrategyGetBuffer. Otherwise, we may be
> paying the cost to maintain such a list when in practice it may be
> required only very few times.

Yeah, something like clock-sweep could work too. I think we need to
carefully select the cache management algorithm. While it's a well
known fact that maintaining LRU ordering is expensive, I'm really not
sure the clock-sweep is a suitable algorithm for logical replication's
cache use cases. Also, we might want to avoid iterating all caches to
find caches to evict.

Regards,

[1] https://www.postgresql.org/message-id/20161219.201505.11562604.horiguchi.kyotaro%40lab.ntt.co.jp

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Re: memory leak in logical WAL sender with pgoutput's cachectx at 2025-08-21 09:55:29 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2025-08-21 23:55:23	Re: Remove redundant assignment of a variable in function AlterPublicationTables
Previous Message	Thomas Munro	2025-08-21 23:38:33	Re: Redesigning postmaster death handling