Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: terminate called after throwing an instance of 'std::bad_alloc' (llvmjit)
Date: 2022-01-06 17:08:33
Message-ID: 20220106170833.GA7796@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

There's no leak after running for ~5 weeks.

$ ps -O lstart,vsize,rss 17930
PID STARTED VSZ RSS S TTY TIME COMMAND
17930 Tue Nov 30 15:35:26 2021 1019464 117424 S ? 7-04:54:03 postgres: telsasoft ts 192.168.122.13(57640) idle

Unless you suggest otherwise , I'm planning to restart the DB soon and go back
to running the pgdg rpm binaries with jit=off rather than what I compiled and
patched locally.

On Thu, Nov 18, 2021 at 03:20:39PM -0600, Justin Pryzby wrote:
> On Wed, Nov 10, 2021 at 09:56:44AM -0600, Justin Pryzby wrote:
> > Thread starting here:
> > https://www.postgresql.org/message-id/20201001021609.GC8476%40telsasoft.com
> >
> > On Fri, Dec 18, 2020 at 05:56:07PM -0600, Justin Pryzby wrote:
> > > I'm 99% sure the "bad_alloc" is from LLVM. It happened multiple times on
> > > different servers (running a similar report) after setting jit=on during pg13
> > > upgrade, and never happened since re-setting jit=off.
> >
> > Since this recurred a few times recently (now running pg14.0), and I finally
> > managed to get a non-truncated corefile...
>
> I think the reason this recurred is that, since upgrading to pg14, I no longer
> had your memleak patches applied. I'd forgotten about it, but was probably
> running a locally compiled postgres with your patches applied.
>
> I should've mentioned that this crash was associated with the message from the
> original problem report:
>
> |terminate called after throwing an instance of 'std::bad_alloc'
> | what(): std::bad_alloc
>
> The leak discussed on other threads seems fixed by your patches - I compiled
> v14 and now running with no visible leaks since last week.
> https://www.postgresql.org/message-id/flat/20210417021602(dot)7dilihkdc7oblrf7(at)alap3(dot)anarazel(dot)de
>
> As I understand it, there's still an issue with an allocation failure causing
> SIGABRT rather than FATAL.
>
> It took me several tries to get the corefile since the process is huge, caused
> by the leak (and abrtd wanted to truncate it, nullifying its utility).
>
> -rw-------. 1 postgres postgres 8.4G Nov 10 08:57 /var/lib/pgsql/14/data/core.31345
>
> I installed more debug packages to get a fuller stacktrace.
>
> #0 0x00007f2497880337 in raise () from /lib64/libc.so.6
> No symbol table info available.
> #1 0x00007f2497881a28 in abort () from /lib64/libc.so.6
> No symbol table info available.
> #2 0x00007f2487cbf265 in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #3 0x00007f2487c66696 in __cxxabiv1::__terminate(void (*)()) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #4 0x00007f2487c666c3 in std::terminate() () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #5 0x00007f2487c687d3 in __cxa_throw () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #6 0x00007f2487c686cd in operator new(unsigned long) () from /usr/lib64/llvm5.0/lib/libLLVM-5.0.so
> No symbol table info available.
> #7 0x00007f2486477b9c in allocateBuckets (this=0x2ff7f38, this=0x2ff7f38, Num=<optimized out>) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:753
> No locals.
> #8 llvm::DenseMap<llvm::APInt, std::unique_ptr<llvm::ConstantInt, std::default_delete<llvm::ConstantInt> >, llvm::DenseMapAPIntKeyInfo, llvm::detail::DenseMapPair<llvm::APInt, std::unique_ptr<llvm::ConstantInt, std::default_delete<llvm::ConstantInt> > > >::grow (this=this(at)entry=0x2ff7f38, AtLeast=<optimized out>)
> at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:691
> OldNumBuckets = 33554432
> OldBuckets = 0x7f23f3e42010
> #9 0x00007f2486477f29 in grow (AtLeast=<optimized out>, this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:461
> No locals.
> #10 InsertIntoBucketImpl<llvm::APInt> (TheBucket=<optimized out>, Lookup=..., Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:510
> NewNumEntries = <optimized out>
> EmptyKey = <optimized out>
> #11 InsertIntoBucket<llvm::APInt const&> (Key=..., TheBucket=<optimized out>, this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:471
> No locals.
> #12 FindAndConstruct (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:271
> TheBucket = <optimized out>
> #13 operator[] (Key=..., this=0x2ff7f38) at /usr/src/debug/llvm-5.0.1.src/include/llvm/ADT/DenseMap.h:275
> No locals.
> #14 llvm::ConstantInt::get (Context=..., V=...) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:550
> pImpl = 0x2ff7eb0
> #15 0x00007f2486478263 in llvm::ConstantInt::get (Ty=0x2ff85a8, V=<optimized out>, isSigned=isSigned(at)entry=false) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Constants.cpp:571
> No locals.
> #16 0x00007f248648673d in LLVMConstInt (IntTy=<optimized out>, N=<optimized out>, SignExtend=SignExtend(at)entry=0) at /usr/src/debug/llvm-5.0.1.src/lib/IR/Core.cpp:952
> No locals.
> #17 0x00007f2488f66c18 in l_ptr_const (type=0x3000650, ptr=<optimized out>) at ../../../../src/include/jit/llvmjit_emit.h:29
> c = <optimized out>
> #18 llvm_compile_expr (state=<optimized out>) at llvmjit_expr.c:246
> op = 0x1a5317690
> opcode = EEOP_OUTER_VAR
> opno = 5
> parent = <optimized out>
> funcname = 0x1a53184e8 "evalexpr_4827_151"
> context = 0x1ba79b8
> b = <optimized out>
> mod = 0x1a5513d30
> eval_fn = <optimized out>
> entry = <optimized out>
> v_state = 0x1a5ce09e0
> v_econtext = 0x1a5ce0a08
> v_isnullp = 0x1a5ce0a30
> v_tmpvaluep = 0x1a5ce0aa8
> v_tmpisnullp = 0x1a5ce0b48
> starttime = {tv_sec = 10799172, tv_nsec = 781670770}
> endtime = {tv_sec = 7077194792, tv_nsec = 0}
> __func__ = "llvm_compile_expr"
> [...]

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2022-01-06 17:17:55 Re: pl/pgsql feature request: shorthand for argument and local variable references
Previous Message Tom Lane 2022-01-06 16:55:03 Re: pl/pgsql feature request: shorthand for argument and local variable references