Re: pg11.1 jit segv

From: Andres Freund <andres(at)anarazel(dot)de>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg11.1 jit segv
Date: 2018-11-27 03:00:35
Message-ID: 20181127030035.n6avagjgmolbrlw7@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2018-11-17 17:37:15 -0600, Justin Pryzby wrote:
> On Fri, Nov 16, 2018 at 10:24:46AM -0600, Justin Pryzby wrote:
> > On Fri, Nov 16, 2018 at 08:38:26AM -0600, Justin Pryzby wrote:
> > > The table is not too special, but was probably ALTERed to add columns a good
> > > number of times by one of our processes. It has ~1100 columns, including
> > > arrays, and some with null_frac=1. I'm trying to come up with a test case
> > > involving column types and order.
>
> Try this ?
>
> SELECT 'DROP TABLE t; CREATE TABLE t (a3 text, a1 int[], '||array_to_string(array_agg('c'||i||' bigint default 0'),',')||'); INSERT INTO t VALUES(0)' FROM generate_series(1,999) i;
> \gexec
> SET jit=on; SET jit_above_cost=0; SELECT a3 FROM t LIMIT 9;
>
> That's given all sorts of nice errors:
>
> ERROR: invalid memory alloc request size 18446744073709551613
> ERROR: compressed data is corrupted
>
> And occasionally crashes and/or returns unrelated data:
>
> = '0', $21 = '0', $22 = '0', $23 = '0', $24 = '0', $25 = '2741'\x03
> n 21782 :constvalue 4 [ 0 0 0 0 0 0 0 0 ]}) :location

Ah, hah. The issue is that t_hoff is larger than 128 here (due to the
size of the NULL bitmap), and apparently getelementptr interprets an
i8 > 128 as a signed integer. Which thus yields a negative offset from
the start of the tuple, which predictably doesn't work great.

v_hoff =
l_load_struct_gep(b, v_tuplep,
FIELDNO_HEAPTUPLEHEADERDATA_HOFF,
"t_hoff");
v_tupdata_base =
LLVMBuildGEP(b,
LLVMBuildBitCast(b,
v_tuplep,
l_ptr(LLVMInt8Type()),
""),
&v_hoff, 1,
"v_tupdata_base");

I'd missed the "These integers are treated as signed values where
relevant." bit in the getelementptr docs
http://llvm.org/docs/LangRef.html#getelementptr-instruction

The fix is easy enough, just adding a
v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), "");
fixes the issue for me.

Could you check that the attached patch this also fixes your original
issue? Going through the code to see if there's other occurances of
this.

Greetings,

Andres Freund

Attachment Content-Type Size
hoff-fix.diff text/x-diff 624 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-11-27 03:13:34 Remove Deprecated Exclusive Backup Mode
Previous Message Thomas Munro 2018-11-27 03:00:34 Re: dsa_allocate() faliure