Re: LLVM JIT: any JIT-compiled query crashes (SIGILL) on a libLLVM 19 + ASAN build

From: "Matheus Alcantara" <matheusssilv97(at)gmail(dot)com>
To: <assam258(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LLVM JIT: any JIT-compiled query crashes (SIGILL) on a libLLVM 19 + ASAN build
Date: 2026-06-15 09:46:06
Message-ID: DJ9IZJYU9J00.345IGM5JLMMNC@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu Jun 11, 2026 at 10:48 PM -03, Henson Choi wrote:
>> I think that the fix is to filter out sanitizer flags when generating
>> bitcode for the JIT code [...]
>> With this fix, JIT works correctly under ASAN + LLVM 19 on my machine.
>
> Confirmed here too: with your filter applied the crash is gone and the JIT
> runs normally under ASAN. Filtering the sanitizer flags out of the
> bitcode is the right fix.
>

Thanks for confirming.

>> the sanitizer instrumentation may change struct layouts in the generated
>> LLVM IR [...] FIELDNO_EXPRSTATE_PARENT = 11 [...]
>
> One nit: on libLLVM 20.1.8 the bitcode struct layout is identical with and
> without -fsanitize=address (e.g. %struct.ExprState, index 11 stays a
> pointer), so it isn't a FIELDNO/layout mismatch here. In short, the crash
> needs debug info (-ggdb) and sanitizer instrumentation to both land in the
> JIT bitcode: the SIGILL is in decodeDiscriminator(), i.e. the instrumented
> IR going through the debug-info path. Your fix keeps the debug info but
> drops the instrumentation, and that alone stops it -- so the
> instrumentation is the trigger. The LLVM 19 assertion is likely the same
> cause surfacing differently.
>

Ok, it make sense.

>> I'm also wondering if this happens only with LLVM 19 or other versions
>> too.
>
> Not LLVM 19 only -- I reproduced the same SIGILL on libLLVM 20.1.8.
>
> v2 series attached, folding in your fix:
>
> 0001 Add a "jit" regression test (renamed/minimized from "jit_crash").
> jit is off by default now, so this turns it on to push a trivial
> query through the JIT provider.
>
> 0002 Your meson fix, with an added warning() so a sanitizer build knows
> its JIT code won't be instrumented. (Author: Matheus Alcantara.)
>
> 0003 Same for autoconf: filter sanitizer flags from BITCODE_CFLAGS/
> CXXFLAGS with a configure warning, plus -g under --enable-debug so
> the bitcode keeps debug info. The -g part is a judgment call --
> autoconf just rebuilds BITCODE_CFLAGS from a whitelist that
> doesn't include -g -- so feel free to keep or drop it.
>
> Tested on both build systems with an ASAN backend: the jit test crashes
> before and passes after, JIT stays functional (pg_jit_available() = t,
> EXPLAIN ANALYZE shows functions compiled), and the warning fires.
>

Thanks. I think that the patches looks good and I also think that it's
good to have a JIT test case since it's off by default. I'm just
wondering if the the test patch should be 0003 instead 0001 since it
will break CI if committed before the meson and autoconf changes.

--
Matheus Alcantara
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ewan Young 2026-06-15 10:04:59 Re: Dead reference to schema_only_with_statistics in pg_dump TAP code
Previous Message Etsuro Fujita 2026-06-15 09:31:02 Re: postgres_fdw: fix cumulative stats after imported foreign-table stats