Re: BUG #16696: Backend crash in llvmjit

From: Dmitry Marakasov <amdmi3(at)amdmi3(dot)ru>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16696: Backend crash in llvmjit
Date: 2020-11-04 21:20:15
Message-ID: 20201104212015.GA30304@hades.panopticon
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

* Andres Freund (andres(at)anarazel(dot)de) wrote:

> > > Environment details:
> > > - FreeBSD 12.1 amd64
> > > - PostgreSQL 13.0 (built from FreeBSD ports)
> > > - llvm-10.0.1 (build from FreeBSD ports)
> >
> > My bad, it's actually llvm-9.0.1. Multiple llvm versions are installed on
> > the system, and PostgreSQL uses llvm9:
> >
> > ldd /usr/local/lib/postgresql/llvmjit.so | grep LLVM
> > libLLVM-9.so => /usr/local/llvm90/lib/libLLVM-9.so (0x800e00000)
>
> Could you try generating a backtrace after turning jit_debugging_support on? That might give a bit more information.
>
> I'll check once I'm home whether I can reproduce in my environment.

I did some digging. First of all, I've discovered that the problem
goes away if llvm bitcode optimization is disabled (by commenting out
llvm_optimize_module call).

I've dumped the opcode and tried compiling it back to match disassembly
of the failing function in gdb disassembly. It didn't match perfectly,
but this place looked similar:

# %bb.84: # %op.32.inputcall
movq %rax, 5267(%r13)
movb %bl, 5275(%r13)
movb $0, 5263(%r13)
movzbl (%rax), %esi
movl __mb_sb_limit(%rip), %edi
movq _ThreadRuneLocale(at)GOTTPOFF(%rip), %rcx
movq %fs:0, %rdx
movq (%rdx,%rcx), %rcx
cmpl %esi, %edi
movq %rax, -96(%rbp) # 8-byte Spill
movl %edi, -72(%rbp) # 4-byte Spill
movq %rcx, -64(%rbp) # 8-byte Spill
jle .LBB1_85

Here's my hypothesis:

The problem happens when boolin() function is inlined by LLVM.
The named function calls isspace() internally, which on FreeBSD is
locale-specific and involves caching some locale parameters in
thread-local variable defined as

extern _Thread_local const _RuneLocale *_ThreadRuneLocale;

The execution crashes on trying to access the named thread-local varible,
probably because something related to TLS is not set up properly in/for
LLVM.

I've confirmed this hypothesis by disabling isspace() calls in boolin()
which has also fixed the problem.

--
Dmitry Marakasov . 55B5 0596 FF1E 8D84 5F56 9510 D35A 80DD F9D2 F77D
amdmi3(at)amdmi3(dot)ru ..: https://github.com/AMDmi3

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Dmitry Marakasov 2020-11-04 23:50:54 Re: BUG #16696: Backend crash in llvmjit
Previous Message Tom Lane 2020-11-04 15:08:44 Re: BUG #16700: Child table dependency loss after moving out of and back into the inheritance tree