Re: Server crash on RHEL 9/s390x platform against PG16

From: Suraj Kharage <suraj(dot)kharage(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Server crash on RHEL 9/s390x platform against PG16
Date: 2023-09-18 05:50:41
Message-ID: CAF1DzPUwm+AL-8Q9bCdh-x4se_s9ZeVSnYEwqkoM1rh0+_m6EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Few more details on this:

(gdb) p val
$1 = 0
(gdb) p i
$2 = 3
(gdb) f 3
#3 0x0000000001a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at
../../../../src/include/executor/tuptable.h:472
472 return slot->tts_ops->copy_minimal_tuple(slot);
(gdb) p *slot
$3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops =
0x1b6dcc8 <TTSOpsVirtual>, tts_tupleDescriptor = 0x202e0e8, tts_values =
0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid =
{ip_blkid = {bi_hi = 65535,
bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0}
(gdb) p *slot->tts_tupleDescriptor
$2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr =
0x0, attrs = 0x202cd28}

(gdb) p slot.tts_values[3]
$4 = 0
(gdb) p slot.tts_values[2]
$5 = 1
(gdb) p slot.tts_values[1]
$6 = 34027556

As per the resultslot, it has 0 value for the third attribute (column
lable).
Im testing this on the docker container and facing some issues with gdb
hence could not able to debug it further.

Here is a explain plan:

postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN
rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN
rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by
rm32044_t1.pkey,label,hidden;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Incremental Sort
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey
Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden
Presorted Key: rm32044_t1.pkey
-> Merge Left Join
Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey,
rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey
Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey,
rm32044_t1.pkey, rm32044_t1.val
Sort Key: rm32044_t1.pkey
-> Nested Loop
Output: rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val
-> Merge Left Join
Output: rm32044_t3.pkey, rm32044_t3.val,
rm32044_t4.pkey
Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey)
-> Sort
Output: rm32044_t3.pkey, rm32044_t3.val
Sort Key: rm32044_t3.pkey
-> Seq Scan on public.rm32044_t3
Output: rm32044_t3.pkey,
rm32044_t3.val
-> Sort
Output: rm32044_t4.pkey
Sort Key: rm32044_t4.pkey
-> Seq Scan on public.rm32044_t4
Output: rm32044_t4.pkey
-> Materialize
Output: rm32044_t1.pkey, rm32044_t1.val
-> Seq Scan on public.rm32044_t1
Output: rm32044_t1.pkey, rm32044_t1.val
-> Sort
Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden
Sort Key: rm32044_t2.pkey
-> Seq Scan on public.rm32044_t2
Output: rm32044_t2.pkey, rm32044_t2.label,
rm32044_t2.hidden
(34 rows)

It seems like while building the innerslot for merge join, the value for
attnum 1 is not getting fetched correctly.

On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage <
suraj(dot)kharage(at)enterprisedb(dot)com> wrote:

> Hi,
>
> Found server crash on RHEL 9/s390x platform with below test case -
>
> *Machine details:*
>
>
>
>
>
>
>
> *[edb(at)9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release
> 9.2 (Turquoise Kodkod)[edb(at)9428da9d2137 postgres]$ lscpuArchitecture:
> s390x CPU op-mode(s): 32-bit, 64-bit Address sizes: 39
> bits physical, 48 bits virtual Byte Order: Big Endian*
> *Configure command:*
> ./configure --prefix=/home/edb/postgres/ --with-lz4 --with-zstd
> --with-llvm --with-perl --with-python --with-tcl --with-openssl
> --enable-nls --with-libxml --with-libxslt --with-systemd --with-libcurl
> --without-icu --enable-debug --enable-cassert --with-pgport=5414
>
>
> *Test case:*
> CREATE TABLE rm32044_t1
> (
> pkey integer,
> val text
> );
> CREATE TABLE rm32044_t2
> (
> pkey integer,
> label text,
> hidden boolean
> );
> CREATE TABLE rm32044_t3
> (
> pkey integer,
> val integer
> );
> CREATE TABLE rm32044_t4
> (
> pkey integer
> );
> insert into rm32044_t1 values ( 1 , 'row1');
> insert into rm32044_t1 values ( 2 , 'row2');
> insert into rm32044_t2 values ( 1 , 'hidden', true);
> insert into rm32044_t2 values ( 2 , 'visible', false);
> insert into rm32044_t3 values (1 , 1);
> insert into rm32044_t3 values (2 , 1);
>
> postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON
> rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> The connection to the server was lost. Attempting reset: Failed.
>
> *backtrace:*
> [edb(at)9428da9d2137 postgres]$ gdb bin/postgres
> data/qemu_postgres_20230911-140628_65620.core
> Core was generated by `postgres: edb postgres [local] SELECT '.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc(at)entry=0x1ba3d10,
> values=values(at)entry=0x1ba4168, isnull=isnull(at)entry=0x1ba41a8) at
> heaptuple.c:227
> 227 VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
> [Current thread is 1 (LWP 65597)]
> Missing separate debuginfos, use: dnf debuginfo-install
> glibc-2.34-60.el9.s390x libcap-2.48-8.el9.s390x
> libedit-3.1-37.20210216cvs.el9.s390x libffi-3.4.2-7.el9.s390x
> libgcc-11.3.1-4.3.el9.alma.s390x libgcrypt-1.10.0-10.el9_2.s390x
> libgpg-error-1.42-5.el9.s390x libstdc++-11.3.1-4.3.el9.alma.s390x
> libxml2-2.9.13-3.el9_2.1.s390x libzstd-1.5.1-2.el9.s390x
> llvm-libs-15.0.7-1.el9.s390x lz4-libs-1.9.3-5.el9.s390x
> ncurses-libs-6.2-8.20210508.el9.s390x openssl-libs-3.0.7-17.el9_2.s390x
> systemd-libs-252-14.el9_2.3.s390x xz-libs-5.2.5-8.el9_0.s390x
> (gdb) bt
> #0 0x00000000010a8366 in heap_compute_data_size (tupleDesc=tupleDesc(at)entry=0x1ba3d10,
> values=values(at)entry=0x1ba4168, isnull=isnull(at)entry=0x1ba41a8) at
> heaptuple.c:227
> #1 0x00000000010a9bb0 in heap_form_minimal_tuple
> (tupleDescriptor=0x1ba3d10, values=0x1ba4168, isnull=0x1ba41a8) at
> heaptuple.c:1484
> #2 0x00000000016553fa in ExecCopySlotMinimalTuple (slot=<optimized out>)
> at ../../../../src/include/executor/tuptable.h:472
> #3 tuplesort_puttupleslot (state=state(at)entry=0x1be4d18, slot=slot(at)entry=0x1ba4120)
> at tuplesortvariants.c:610
> #4 0x00000000012dc0e0 in ExecIncrementalSort (pstate=0x1acb4d8) at
> nodeIncrementalSort.c:716
> #5 0x00000000012b32c6 in ExecProcNode (node=0x1acb4d8) at
> ../../../src/include/executor/executor.h:273
> #6 ExecutePlan (execute_once=<optimized out>, dest=0x1ade698,
> direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
> operation=CMD_SELECT, use_parallel_mode=<optimized out>,
> planstate=0x1acb4d8, estate=0x1acb258) at execMain.c:1670
> #7 standard_ExecutorRun (queryDesc=0x19ad338, direction=<optimized out>,
> count=0, execute_once=<optimized out>) at execMain.c:365
> #8 0x00000000014a6ae2 in PortalRunSelect (portal=portal(at)entry=0x1a63558,
> forward=forward(at)entry=true, count=0, count(at)entry=9223372036854775807,
> dest=dest(at)entry=0x1ade698) at pquery.c:924
> #9 0x00000000014a84e0 in PortalRun (portal=portal(at)entry=0x1a63558,
> count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
> run_once=run_once(at)entry=true, dest=dest(at)entry=0x1ade698,
> altdest=0x1ade698, qc=0x40007ff7b0) at pquery.c:768
> #10 0x00000000014a3c1c in exec_simple_query (
> query_string=0x19ea0e8 "SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2
> ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON
> rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden;")
> at postgres.c:1274
> #11 0x00000000014a57aa in PostgresMain (dbname=<optimized out>,
> username=<optimized out>) at postgres.c:4637
> #12 0x00000000013fdaf6 in BackendRun (port=0x1a132c0, port=0x1a132c0) at
> postmaster.c:4464
> #13 BackendStartup (port=0x1a132c0) at postmaster.c:4192
> #14 ServerLoop () at postmaster.c:1782
> #15 0x00000000013fec34 in PostmasterMain (argc=argc(at)entry=3,
> argv=argv(at)entry=0x19a59a0) at postmaster.c:1466
> #16 0x0000000001096faa in main (argc=<optimized out>, argv=0x19a59a0) at
> main.c:198
>
> (gdb) p val
> $1 = 0
> ```
>
> Does anybody have any idea about this?
>
> --
> --
>
> Thanks & Regards,
> Suraj kharage,
>
>
>
> edbpostgres.com
>

--
--

Thanks & Regards,
Suraj kharage,

edbpostgres.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-09-18 06:15:53 Re: information_schema and not-null constraints
Previous Message 杨伯宇 (长堂) 2023-09-18 05:00:59 function "cursor_to_xmlschema" causes a crash