LLVM JIT: any JIT-compiled query crashes (SIGILL) on a libLLVM 19 + ASAN build

From: Henson Choi <assam258(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: LLVM JIT: any JIT-compiled query crashes (SIGILL) on a libLLVM 19 + ASAN build
Date: 2026-06-10 02:09:16
Message-ID: CAAAe_zD6jGANGZFKnHLKHF8izqmqqJbVe=NOuERFwN_Spj5VOA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

While looking into Andres Freund's note that cfbot is failing with crashes
inside the JIT on the Row Pattern Recognition patch [1], I found that the
crash is not specific to that patch at all: on the CI's AddressSanitizer
build with LLVM 19, any query that is pushed through the LLVM JIT code
generator crashes the backend with SIGILL. It reproduces on plain master
with a trivial aggregate, so I am reporting it as its own issue, separate
from that feature.

Minimal reproduction
--------------------

SET jit = on;
SET jit_above_cost = 0;
SET jit_optimize_above_cost = 0;
SET jit_inline_above_cost = 0;

SELECT count(*)
FROM (SELECT i, i * 2 + 1 AS x
FROM generate_series(1, 100000) i
WHERE i % 3 = 0) t;

Result:

server closed the connection unexpectedly
...
LOG: client backend (PID NNNNN) was terminated by signal 4: Illegal
instruction

A postmaster (forked backend) is required to reproduce reliably; single-user
mode does not trip it. With jit = off the same query runs fine.

Environment
-----------

This is the cfbot Linux task environment:

- Debian Trixie, libLLVM 19.1
- CFLAGS = -O2 -ggdb -fno-sanitize-recover=all -fsanitize=address
- LDFLAGS = -fsanitize=address
- meson: -Dcassert=true -Dinjection_points=true --buildtype=debug
-Dllvm=enabled (auto_features=disabled)

I reproduced this in a container that mirrors the CI configuration, and also
on a from-scratch build of plain upstream master
(89eafad297a9b01ad77cfc1ab93a433e0af894b0, "Fix tuple deforming with virtual
generated columns"), which contains no in-flight feature patches.

Backtrace
---------

The stack is corrupted at the crash, but with libLLVM debug info the top
frames resolve consistently to:

Program terminated with signal SIGILL, Illegal instruction.
#0 getUnsignedFromPrefixEncoding ()
at llvm/include/llvm/Support/Discriminator.h:34
#1 decodeDiscriminator ()
at llvm/lib/IR/DebugInfoMetadata.cpp:283

The crashing rip lands in the middle of a valid instruction
(decodeDiscriminator+48, the immediate byte of "and $0x1f,%r10d"), i.e. the
libLLVM code itself is intact and control flow was transferred into it at a
bad offset. The crash always lands at the same place, for every JIT-compiled
query, which suggests it is systematic rather than random corruption. It
surfaces in libLLVM's debug-info (discriminator) handling, and persists with
JIT inlining and optimization both disabled.

Reproducer patch
----------------

The attached patch adds a small "jit_crash" regression test that forces the
JIT compiler (jit on, all jit_*_above_cost set to 0) using a plain aggregate
over generate_series(). On a working installation it passes; on the broken
LLVM 19 + ASAN environment it crashes as above. I have also registered it in
the commitfest so cfbot exercises it directly.

References
----------

[1]
https://www.postgresql.org/message-id/p7r5bekdbl2zcazid7agvfo2nfnq5bim2a5jkckqygld32n325%40fctfp6ou6qnb

Thanks,
Henson Choi

Attachment Content-Type Size
v1-0001-Add-jit_crash-regression-test-to-force-LLVM-JIT-c.patch application/octet-stream 4.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-06-10 02:12:48 Re: Fix unqualified catalog references in psql describe queries
Previous Message Michael Paquier 2026-06-10 01:57:16 Re: Fix unqualified catalog references in psql describe queries