BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: noriyoshi(dot)shinoda(at)hpe(dot)com
Subject: BUG #16754: When using LLVM and parallel queries aborted all session by pg_cancel_backend.
Date: 2020-12-01 05:43:56
Message-ID: 16754-4a800d382f5cfae9@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 16754
Logged by: Noriyoshi Shinoda
Email address: noriyoshi(dot)shinoda(at)hpe(dot)com
PostgreSQL version: 13.1
Operating system: Red Hat Enterprise Linux 7.8 (x86-64)
Description:

OS:
Red Hat Enterprise Linux 7.8 (x86-64)

Version:
PostgreSQL 13.1
cfe-7.1.0
llvm-7.1.0
cmake-3.18.4

Configure:
configure --prefix=$HOME/pg131 --with-llvm --enable-debug
--enable-cassert

$ pg_config
BINDIR = /home/postgres/pg131/bin
DOCDIR = /home/postgres/pg131/share/doc
HTMLDIR = /home/postgres/pg131/share/doc
INCLUDEDIR = /home/postgres/pg131/include
PKGINCLUDEDIR = /home/postgres/pg131/include
INCLUDEDIR-SERVER = /home/postgres/pg131/include/server
LIBDIR = /home/postgres/pg131/lib
PKGLIBDIR = /home/postgres/pg131/lib
LOCALEDIR = /home/postgres/pg131/share/locale
MANDIR = /home/postgres/pg131/share/man
SHAREDIR = /home/postgres/pg131/share
SYSCONFDIR = /home/postgres/pg131/etc
PGXS = /home/postgres/pg131/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--prefix=/home/postgres/pg131' '--with-llvm' '--enable-debug'
'--enable-cassert'
CC = gcc -std=gnu99
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv
-fexcess-precision=standard -g -O2
CFLAGS_SL = -fPIC
LDFLAGS = -L/usr/local/lib -Wl,--as-needed
-Wl,-rpath,'/home/postgres/pg131/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lz -lreadline -lrt -ldl -lm
VERSION = PostgreSQL 13.1

- Prepare
$ initdb --no-locale --encoding=utf8 data.131
$ pg_ctl -D data.131 start
$ psql
postgres=# SELECT version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 13.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-39), 64-bit
(1 row)
postgres=# CREATE USER demo PASSWORD 'demo';
CREATE ROLE
postgres=# \connect postgres demo
You are now connected to database "postgres" as user "demo".
postgres=> CREATE TABLE data1(c1 NUMERIC, c2 VARCHAR(10));
CREATE TABLE
postgres=> INSERT INTO data1 VALUES (generate_series(1, 1000000),
'data1');
INSERT 0 1000000
postgres=> ANALYZE data1;
ANALYZE
postgres=> \d+
List of relations
Schema | Name | Type | Owner | Persistence | Size | Description
--------+-------+-------+-------+-------------+-------+-------------
public | data1 | table | demo | permanent | 42 MB |
(1 row)

- Operation
-- Normal case (No JIT/Cancel parallel worker)
--- Session#1
$ psql postgres demo
postgres=> SET force_parallel_mode = on;
SET
postgres=> SET jit = off;
SET
postgres=> EXPLAIN SELECT * FROM data1 d1, data1 d2;
QUERY PLAN
-------------------------------------------------------------------------------------
Gather (cost=1000.00..117383034312.00 rows=1000000000000 width=24)
Workers Planned: 1
Single Copy: true
-> Nested Loop (cost=0.00..17383033312.00 rows=1000000000000
width=24)
-> Seq Scan on data1 d1 (cost=0.00..15406.00 rows=1000000
width=12)
-> Materialize (cost=0.00..25289.00 rows=1000000 width=12)
-> Seq Scan on data1 d2 (cost=0.00..15406.00 rows=1000000
width=12)
(7 rows)

postgres=> EXPLAIN ANALYZE SELECT * FROM data1 d1, data1 d2; <- Execute
pg_cancel_backend(130457) on Session#2
ERROR: canceling statement due to user request
CONTEXT: parallel worker

--- Session#2
$ psql
postgres=# SELECT pid, leader_pid FROM pg_stat_activity WHERE
usename='demo';
pid | leader_pid
--------+------------
130413 |
130457 | 130413
(2 rows)

postgres=# SELECT pg_cancel_backend(130457);
pg_cancel_backend
-------------------
t
(1 row)

-- NG case (With JIT / Cancel parallel worker)
--- Session#1

$ psql postgres demo
postgres=> SET force_parallel_mode = on;
SET
postgres=> SET jit = on;
SET
postgres=> EXPLAIN SELECT * FROM data1 d1, data1 d2;
QUERY PLAN
-------------------------------------------------------------------------------------
Gather (cost=1000.00..117383034312.00 rows=1000000000000 width=24)
Workers Planned: 1
Single Copy: true
-> Nested Loop (cost=0.00..17383033312.00 rows=1000000000000
width=24)
-> Seq Scan on data1 d1 (cost=0.00..15406.00 rows=1000000
width=12)
-> Materialize (cost=0.00..25289.00 rows=1000000 width=12)
-> Seq Scan on data1 d2 (cost=0.00..15406.00 rows=1000000
width=12)
JIT:
Functions: 3
Options: Inlining true, Optimization true, Expressions true, Deforming
true
(10 rows)

postgres=> EXPLAIN ANALYZE SELECT * FROM data1 d1, data1 d2; <- Execute
pg_cancel_backend(130342);
WARNING: terminating connection because of crash of another server
process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
ERROR: canceling statement due to user request
CONTEXT: parallel worker
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?>

--- Session#2
postgres=# SELECT pid, leader_pid FROM pg_stat_activity WHERE
usename='demo';
pid | leader_pid
--------+------------
130322 |
130342 | 130322
(2 rows)

postgres=# SELECT pg_cancel_backend(130342);
pg_cancel_backend
-------------------
t
(1 row)

- log file
2020-12-01 14:26:02.337 JST [130270] LOG: background worker "parallel
worker" (PID 130342) was terminated by signal 11: Segmentation fault
2020-12-01 14:26:02.337 JST [130270] DETAIL: Failed process was running:
EXPLAIN ANALYZE SELECT * FROM data1 d1, data1 d2;
2020-12-01 14:26:02.337 JST [130270] LOG: terminating any other active
server processes
2020-12-01 14:26:02.337 JST [130322] WARNING: terminating connection
because of crash of another server process
2020-12-01 14:26:02.337 JST [130322] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-12-01 14:26:02.337 JST [130322] HINT: In a moment you should be able
to reconnect to the database and repeat your command.
2020-12-01 14:26:02.338 JST [130276] WARNING: terminating connection
because of crash of another server process
2020-12-01 14:26:02.338 JST [130276] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-12-01 14:26:02.338 JST [130276] HINT: In a moment you should be able
to reconnect to the database and repeat your command.
2020-12-01 14:26:02.340 JST [130345] FATAL: the database system is in
recovery mode
2020-12-01 14:26:02.343 JST [130319] WARNING: terminating connection
because of crash of another server process
2020-12-01 14:26:02.343 JST [130319] DETAIL: The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2020-12-01 14:26:02.343 JST [130319] HINT: In a moment you should be able
to reconnect to the database and repeat your command.
2020-12-01 14:26:02.343 JST [130270] LOG: all server processes terminated;
reinitializing
2020-12-01 14:26:02.355 JST [130348] LOG: database system was interrupted;
last known up at 2020-12-01 14:21:07 JST
2020-12-01 14:26:02.360 JST [130348] LOG: database system was not properly
shut down; automatic recovery in progress
2020-12-01 14:26:02.360 JST [130348] LOG: redo starts at 0/15C07D0
2020-12-01 14:26:02.767 JST [130348] LOG: invalid record length at
0/5B10A90: wanted 24, got 0
2020-12-01 14:26:02.767 JST [130348] LOG: redo done at 0/5B10A58
2020-12-01 14:26:02.821 JST [130270] LOG: database system is ready to
accept connections

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-12-01 06:01:33 Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
Previous Message Thomas Munro 2020-12-01 03:06:48 Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted