Re: Server crash on RHEL 9/s390x platform against PG16

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash on RHEL 9/s390x platform against PG16
Date: 2023-10-12 10:42:18
Message-ID: CAF1DzPXA3Tzdw+DT9kLYd7ZxXznhC53zfoANT4m-4Xr4etOqMg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is clang version:

[edb(at)9428da9d2137]$ clang --version

clang version 15.0.7 (Red Hat 15.0.7-2.el9)

Target: s390x-ibm-linux-gnu

Thread model: posix

InstalledDir: /usr/bin

Let me know if any further information is needed.

On Mon, Oct 9, 2023 at 8:21 AM Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
wrote:

> It looks like an issue with JIT. If I disable the JIT then the above query
> runs successfully.
>
> postgres=# set jit to off;
>
> SET
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>
> pkey | val | pkey | label | hidden | pkey | val | pkey
>
> ------+------+------+---------+--------+------+-----+------
>
> 1 | row1 | 1 | hidden | t | 1 | 1 |
>
> 1 | row1 | 1 | hidden | t | 2 | 1 |
>
> 2 | row2 | 2 | visible | f | 1 | 1 |
>
> 2 | row2 | 2 | visible | f | 2 | 1 |
>
> (4 rows)
>
> Any idea on this?
>
> On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage <
> suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
>
>> Few more details on this:
>>
>> (gdb) p val
>> $1 = 0
>> (gdb) p i
>> $2 = 3
>> (gdb) f 3
>> #3 0x0000000001a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
>> ../../../../src/include/executor/tuptable.h:472
>> 472 return slot->tts_ops->copy_minimal_tuple(slot);
>> (gdb) p *slot
>> $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
>> 0x1b6dcc8 <TTSOpsVirtual>, tts_tupleDescriptor = 0x202e0e8, tts_values =
>> 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
>> {ip_blkid = {bi_hi = 65535,
>> bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
>> (gdb) p *slot->tts_tupleDescriptor
>> $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr
>> = 0x0, attrs = 0x202cd28}
>>
>> (gdb) p slot.tts_values[3]
>> $4 = 0
>> (gdb) p slot.tts_values[2]
>> $5 = 1
>> (gdb) p slot.tts_values[1]
>> $6 = 34027556
>>
>>
>> As per the resultslot, it has 0 value for the third attribute (column
>> lable).
>> Im testing this on the docker container and facing some issues with gdb
>> hence could not able to debug it further.
>>
>> Here is a explain plan:
>>
>> postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT
>> JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
>> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
>> rm32044_t1.pkey,label,hidden;
>>
>> QUERY PLAN
>>
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------
>> Incremental Sort
>> Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
>> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>> Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
>> Presorted Key: rm32044_t1.pkey
>> -> Merge Left Join
>> Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
>> rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>> Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
>> -> Sort
>> Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
>> rm32044_t1.pkey, rm32044_t1.val
>> Sort Key: rm32044_t1.pkey
>> -> Nested Loop
>> Output: rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
>> -> Merge Left Join
>> Output: rm32044_t3.pkey, rm32044_t3.val,
>> rm32044_t4.pkey
>> Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
>> -> Sort
>> Output: rm32044_t3.pkey, rm32044_t3.val
>> Sort Key: rm32044_t3.pkey
>> -> Seq Scan on public.rm32044_t3
>> Output: rm32044_t3.pkey,
>> rm32044_t3.val
>> -> Sort
>> Output: rm32044_t4.pkey
>> Sort Key: rm32044_t4.pkey
>> -> Seq Scan on public.rm32044_t4
>> Output: rm32044_t4.pkey
>> -> Materialize
>> Output: rm32044_t1.pkey, rm32044_t1.val
>> -> Seq Scan on public.rm32044_t1
>> Output: rm32044_t1.pkey, rm32044_t1.val
>> -> Sort
>> Output: rm32044_t2.pkey, rm32044_t2.label,
>> rm32044_t2.hidden
>> Sort Key: rm32044_t2.pkey
>> -> Seq Scan on public.rm32044_t2
>> Output: rm32044_t2.pkey, rm32044_t2.label,
>> rm32044_t2.hidden
>> (34 rows)
>>
>>
>> It seems like while building the innerslot for merge join, the value for
>> attnum 1 is not getting fetched correctly.
>>
>> On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <
>> suraj(dot)kharage(at)enterprisedb(dot)com> wrote:
>>
>>> Hi,
>>>
>>> Found server crash on RHEL 9/s390x platform with below test case -
>>>
>>> *Machine details:*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *[edb(at)9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
>>> 9.2 (Turquoise Kodkod)[edb(at)9428da9d2137 postgres]$ lscpuArchitecture:
>>> s390x CPU op-mode(s): 32-bit, 64-bit Address sizes: 39
>>> bits physical, 48 bits virtual Byte Order: Big Endian*
>>> *Configure command:*
>>> ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd
>>> --with-llvm --with-perl --with-python --with-tcl --with-openssl
>>> --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl
>>> --without-icu --enable-debug --enable-cassert --with-pgport=5414
>>>
>>>
>>> *Test case:*
>>> CREATE TABLE rm32044_t1
>>> (
>>> pkey integer,
>>> val text
>>> );
>>> CREATE TABLE rm32044_t2
>>> (
>>> pkey integer,
>>> label text,
>>> hidden boolean
>>> );
>>> CREATE TABLE rm32044_t3
>>> (
>>> pkey integer,
>>> val integer
>>> );
>>> CREATE TABLE rm32044_t4
>>> (
>>> pkey integer
>>> );
>>> insert into rm32044_t1 values ( 1 , 'row1');
>>> insert into rm32044_t1 values ( 2 , 'row2');
>>> insert into rm32044_t2 values ( 1 , 'hidden', true);
>>> insert into rm32044_t2 values ( 2 , 'visible', false);
>>> insert into rm32044_t3 values (1 , 1);
>>> insert into rm32044_t3 values (2 , 1);
>>>
>>> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
>>> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
>>> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
>>> server closed the connection unexpectedly
>>> This probably means the server terminated abnormally
>>> before or while processing the request.
>>> The connection to the server was lost. Attempting reset: Failed.
>>> The connection to the server was lost. Attempting reset: Failed.
>>>
>>> *backtrace:*
>>> [edb(at)9428da9d2137 postgres]$ gdb bin/postgres
>>> data/qemu_postgres_20230911-140628_65620.core
>>> Core was generated by `postgres: edb postgres [local] SELECT '.
>>> Program terminated with signal SIGSEGV, Segmentation fault.
>>> #0 0x00000000010a8366 in heap_compute_data_size
>>> (tupleDesc=tupleDesc(at)entry=0x1ba3d10, values=values(at)entry=0x1ba4168,
>>> isnull=isnull(at)entry=0x1ba41a8) at heaptuple.c:227
>>> 227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
>>> [Current thread is 1 (LWP 65597)]
>>> Missing separate debuginfos, use: dnf debuginfo-install
>>> glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x
>>> libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x
>>> libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x
>>> libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x
>>> libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x
>>> llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x
>>> ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x
>>> systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
>>> (gdb) bt
>>> #0 0x00000000010a8366 in heap_compute_data_size
>>> (tupleDesc=tupleDesc(at)entry=0x1ba3d10, values=values(at)entry=0x1ba4168,
>>> isnull=isnull(at)entry=0x1ba41a8) at heaptuple.c:227
>>> #1 0x00000000010a9bb0 in heap_form_minimal_tuple
>>> (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at
>>> heaptuple.c:1484
>>> #2 0x00000000016553fa in ExecCopySlotMinimalTuple (slot=<optimized
>>> out>) at ../../../../src/include/executor/tuptable.h:472
>>> #3 tuplesort_puttupleslot (state=state(at)entry=0x1be4d18, slot=slot(at)entry=0x1ba4120)
>>> at tuplesortvariants.c:610
>>> #4 0x00000000012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at
>>> nodeIncrementalSort.c:716
>>> #5 0x00000000012b32c6 in ExecProcNode (node=0x1acb4d8) at
>>> ../../../src/include/executor/executor.h:273
>>> #6 ExecutePlan (execute_once=<optimized out>, dest=0x1ade698,
>>> direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
>>> operation=CMD_SELECT, use_parallel_mode=<optimized out>,
>>> planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
>>> #7 standard_ExecutorRun (queryDesc=0x19ad338, direction=<optimized
>>> out>, count=0, execute_once=<optimized out>) at execMain.c:365
>>> #8 0x00000000014a6ae2 in PortalRunSelect (portal=portal(at)entry=0x1a63558,
>>> forward=forward(at)entry=true, count=0, count(at)entry=9223372036854775807,
>>> dest=dest(at)entry=0x1ade698) at pquery.c:924
>>> #9 0x00000000014a84e0 in PortalRun (portal=portal(at)entry=0x1a63558,
>>> count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
>>> run_once=run_once(at)entry=true, dest=dest(at)entry=0x1ade698,
>>> altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768
>>> #10 0x00000000014a3c1c in exec_simple_query (
>>> query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN
>>> rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
>>> rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
>>> rm32044_t1.pkey,label,hidden;") at postgres.c:1274
>>> #11 0x00000000014a57aa in PostgresMain (dbname=<optimized out>,
>>> username=<optimized out>) at postgres.c:4637
>>> #12 0x00000000013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at
>>> postmaster.c:4464
>>> #13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
>>> #14 ServerLoop () at postmaster.c:1782
>>> #15 0x00000000013fec34 in PostmasterMain (argc=argc(at)entry=3,
>>> argv=argv(at)entry=0x19a59a0) at postmaster.c:1466
>>> #16 0x0000000001096faa in main (argc=<optimized out>, argv=0x19a59a0) at
>>> main.c:198
>>>
>>> (gdb) p val
>>> $1 = 0
>>> ```
>>>
>>> Does anybody have any idea about this?
>>>
>>> --
>>> --
>>>
>>> Thanks & Regards,
>>> Suraj kharage,
>>>
>>>
>>>
>>> edbpostgres.com
>>>
>>
>>
>> --
>> --
>>
>> Thanks & Regards,
>> Suraj kharage,
>>
>>
>>
>> edbpostgres.com
>>
>
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
>
>
>
> edbpostgres.com
>

--
--

Thanks & Regards,
Suraj kharage,

edbpostgres.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2023-10-12 11:00:00 Test 026_overwrite_contrecord fails on very slow machines (under Valgrind)
Previous Message Erki Eessaar 2023-10-12 10:38:25 PostgreSQL domains and NOT NULL constraint