Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [ NOT Fixed ]

From: Michael Banck <mbanck(at)gmx(dot)net>
To: Kirk Wolak <wolakk(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: Oom on temp (un-analyzed table caused by JIT) V16.1 [ NOT Fixed ]
Date: 2024-02-22 21:49:20
Message-ID: 65d7c161.050a0220.68515.f205@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Jan 24, 2024 at 02:50:52PM -0500, Kirk Wolak wrote:
> On Mon, Jan 22, 2024 at 1:30 AM Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
> > On Fri, Jan 19, 2024 at 7:03 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> >> > On 19 Jan 2024, at 23:09, Kirk Wolak <wolakk(at)gmail(dot)com> wrote:
> > Thank you, that made it possible to build and run...
> > UNFORTUNATELY this has a CLEAR memory leak (visible in htop)
> > I am watching it already consuming 6% of my system memory.
> >
> Daniel,
> In the previous email, I made note that once the JIT was enabled, the
> problem exists in 17Devel.
> I re-included my script, which forced the JIT to be used...
>
> I attached an updated script that forced the settings.
> But this is still leaking memory (outside of the
> pg_backend_memory_context() calls).
> Probably because it's at the LLVM level? And it does NOT happen from
> planning/opening the query. It appears I have to fetch the rows to
> see the problem.

I had a look at this (and blogged about it here[1]) and was also
wondering what was going on with 17devel and the recent back-branch
releases, cause I could also reproduce those continuing memory leaks.

Adding some debug logging when llvm_inline_reset_caches() is called
solves the mystery: as you are calling a function, the fix that is in
17devel and the back-branch releases is not applicable and only after
the function returns llvm_inline_reset_caches() is being called (as
llvm_jit_context_in_use_count is greater than zero, presumably, so it
never reaches the call-site of llvm_inline_reset_caches()).

If you instead run your SQL in a DO-loop (as in the blog post) and not
as a PL/PgSQL function, you should see that it no longer leaks. This
might be obvious to some (and Andres mentioned it in
https://www.postgresql.org/message-id/20210421002056.gjd6rpe6toumiqd6%40alap3.anarazel.de)
but it took me a while to figure out/find.

Michael

[1] https://www.credativ.de/en/blog/postgresql-en/quick-benchmark-postgresql-2024q1-release-performance-improvements/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-02-22 22:04:04 Re: Add lookup table for replication slot invalidation causes
Previous Message Euler Taveira 2024-02-22 21:01:23 Re: speed up a logical replica setup