Quick Links

RE: memory leak in logical WAL sender with pgoutput's cachectx

From:	"Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To:	Xuneng Zhou <xunengzhou(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	赵宇鹏(宇彭) <zhaoyupeng(dot)zyp(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	RE: memory leak in logical WAL sender with pgoutput's cachectx
Date:	2025-08-15 05:25:50
Message-ID:	OS0PR01MB5716C4A6AEC69BE733820AA49434A@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Friday, August 15, 2025 10:59 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> Thanks for your clarification!
>
> On Fri, Aug 15, 2025 at 10:10 AM Zhijie Hou (Fujitsu) <houzj(dot)fnst(at)fujitsu(dot)com>
> wrote:
> >
> > On Thursday, August 14, 2025 8:49 PM Hayato Kuroda (Fujitsu)
> <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > Dear Xuneng,
> > >
> > > > Is it safe to free the substructure from within
> rel_sync_cache_relation_cb()?
> > >
> > > You referred the comment in rel_sync_cache_relation_cb() right? I
> > > understood like that we must not access to any *system caches*, from
> > > the comment. Here we do not re-build caches so that we do not access
> > > to the syscaches - it is permitted.
> >
> > We should also avoid freeing the cache within the callback function,
> > as entries might still be accessed after the callback is invoked. This
> > callback can be triggered during the execution of pgoutput_change,
> > which poses a risk of concurrent access issues. For reference, see
> > commit 7f481b8, which addresses a similar issue. Ideally, cache
> > release should occur within the normal code execution path, such as within
> pgoutput_change() or get_rel_sync_entry().
>
> I am still getting familiar with this module, so I cannot make a sound
> confirmation for this yet. I'll take a look at the commit mentioned by Zhijie and
> trace the callback paths.
>
> > >
> > > > I’ also interested in the reasoning behind setting
> > > > NINVALIDATION_THRESHOLD to 100.
> > >
> > > This is the debatable point of this implementation. I set to 100
> > > because it was sufficient with the provided workload, but this may cause
> the replication lag.
> > > We may have to consider a benchmark workload, measure data, and
> > > consider the appropriate value. 100 is just an initial point.
> >
> > I think it's worthwhile to test the performance impact of this
> > approach. If the table is altered but not dropped, removing the
> > entries could introduce additional overhead. Although this overhead
> > might be negligible if within an acceptable threshold, it still
> > warrants consideration. Additionally, if the number of invalidations
> > is significant, the cleanup process may take a considerable amount of time,
> potentially resulting in performance degradation.
> > In such cases, it might be beneficial to consider destroying the hash
> > table and creating a new one. Therefore, analyzing these scenarios is
> essential.
>
> Intuitively, a heuristic threshold is straightforward but also can be hard to get it
> "right" since the workload varies.
> Designing and writing tests to reflect the real-world use cases well may not be
> that easy as well. I'm wondering whether it's beneficial to brainstorm other
> potential alternatives before settling on the current approach and start
> benchmarking and performance tuning.
>
> Sequential cleaning could be costful in certain cases. I don't know much about
> RelationSyncCache's life cycle, but it seems that we should be cautious on
> shortening it without causing subtle issues.

I don't have a great idea that can avoid introducing overhead in all cases.

An alternative approach I considered before is deleting the hash entry upon
invalidation if the entry is not in use. This would require adding an in_use
flag in RelationSyncEntry, setting it to true when get_rel_sync_entry is called,
and marking it as false once the entry is no longer in use. This approach is
inspired from the relcache invalidation. I am slightly not sure if it's worth
the effort, but just share this idea with a POC patch for reference, which has
been modified based on Kuroda-San's version.

Best Regards,
Hou zj

Attachment	Content-Type	Size
vApproach2-0001-delete-the-unused-hash-entry-on-invalidat.patch	application/octet-stream	9.3 KB

In response to

Re: memory leak in logical WAL sender with pgoutput's cachectx at 2025-08-15 02:59:29 from Xuneng Zhou

Responses

Re: memory leak in logical WAL sender with pgoutput's cachectx at 2025-08-15 08:09:50 from Masahiko Sawada

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Thomas Munro	2025-08-15 05:39:30	Re: Potential deadlock in pgaio_io_wait()
Previous Message	Michael Paquier	2025-08-15 05:25:35	Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options