| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Tatsuo Ishii <ishii(at)postgresql(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com> |
| Subject: | Re: LLVM JIT: any JIT-compiled query crashes (SIGILL) on a libLLVM 19 + ASAN build |
| Date: | 2026-06-10 02:43:30 |
| Message-ID: | CAAAe_zB69xai4mYxiRbL3n3PtDJcr2E5KgEicihsOg5d4OewfA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
(Reported-Problem Reproduction) -> (Row Pattern Recognition)
2026년 6월 10일 (수) 오전 11:42, Henson Choi <assam258(at)gmail(dot)com>님이 작성:
> Hi Andres,
>
> Just to let you know — the CI run for this commitfest entry shows the
> same crash independently on master as well, so this may not be an RPR
> (Reported-Problem Reproduction) issue specific to the patch.
>
> The identical crash occurs on a standalone test against master.
>
> Thanks,
> Henson
>
> 2026년 6월 10일 (수) 오전 11:09, Henson Choi <assam258(at)gmail(dot)com>님이 작성:
>
>> Hi hackers,
>>
>> While looking into Andres Freund's note that cfbot is failing with crashes
>> inside the JIT on the Row Pattern Recognition patch [1], I found that the
>> crash is not specific to that patch at all: on the CI's AddressSanitizer
>> build with LLVM 19, any query that is pushed through the LLVM JIT code
>> generator crashes the backend with SIGILL. It reproduces on plain master
>> with a trivial aggregate, so I am reporting it as its own issue, separate
>> from that feature.
>>
>> Minimal reproduction
>> --------------------
>>
>> SET jit = on;
>> SET jit_above_cost = 0;
>> SET jit_optimize_above_cost = 0;
>> SET jit_inline_above_cost = 0;
>>
>> SELECT count(*)
>> FROM (SELECT i, i * 2 + 1 AS x
>> FROM generate_series(1, 100000) i
>> WHERE i % 3 = 0) t;
>>
>> Result:
>>
>> server closed the connection unexpectedly
>> ...
>> LOG: client backend (PID NNNNN) was terminated by signal 4: Illegal
>> instruction
>>
>> A postmaster (forked backend) is required to reproduce reliably;
>> single-user
>> mode does not trip it. With jit = off the same query runs fine.
>>
>> Environment
>> -----------
>>
>> This is the cfbot Linux task environment:
>>
>> - Debian Trixie, libLLVM 19.1
>> - CFLAGS = -O2 -ggdb -fno-sanitize-recover=all -fsanitize=address
>> - LDFLAGS = -fsanitize=address
>> - meson: -Dcassert=true -Dinjection_points=true --buildtype=debug
>> -Dllvm=enabled (auto_features=disabled)
>>
>> I reproduced this in a container that mirrors the CI configuration, and
>> also
>> on a from-scratch build of plain upstream master
>> (89eafad297a9b01ad77cfc1ab93a433e0af894b0, "Fix tuple deforming with
>> virtual
>> generated columns"), which contains no in-flight feature patches.
>>
>> Backtrace
>> ---------
>>
>> The stack is corrupted at the crash, but with libLLVM debug info the top
>> frames resolve consistently to:
>>
>> Program terminated with signal SIGILL, Illegal instruction.
>> #0 getUnsignedFromPrefixEncoding ()
>> at llvm/include/llvm/Support/Discriminator.h:34
>> #1 decodeDiscriminator ()
>> at llvm/lib/IR/DebugInfoMetadata.cpp:283
>>
>> The crashing rip lands in the middle of a valid instruction
>> (decodeDiscriminator+48, the immediate byte of "and $0x1f,%r10d"), i.e.
>> the
>> libLLVM code itself is intact and control flow was transferred into it at
>> a
>> bad offset. The crash always lands at the same place, for every
>> JIT-compiled
>> query, which suggests it is systematic rather than random corruption. It
>> surfaces in libLLVM's debug-info (discriminator) handling, and persists
>> with
>> JIT inlining and optimization both disabled.
>>
>> Reproducer patch
>> ----------------
>>
>> The attached patch adds a small "jit_crash" regression test that forces
>> the
>> JIT compiler (jit on, all jit_*_above_cost set to 0) using a plain
>> aggregate
>> over generate_series(). On a working installation it passes; on the broken
>> LLVM 19 + ASAN environment it crashes as above. I have also registered it
>> in
>> the commitfest so cfbot exercises it directly.
>>
>> References
>> ----------
>>
>> [1]
>> https://www.postgresql.org/message-id/p7r5bekdbl2zcazid7agvfo2nfnq5bim2a5jkckqygld32n325%40fctfp6ou6qnb
>>
>> Thanks,
>> Henson Choi
>>
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2026-06-10 03:40:38 | Re: GetBufferDescriptor() being called for local buffers from MarkBufferDirtyHint() |
| Previous Message | Henson Choi | 2026-06-10 02:42:43 | Re: LLVM JIT: any JIT-compiled query crashes (SIGILL) on a libLLVM 19 + ASAN build |