Segfault 11 on PG10 with max_parallel_workers_per_gather>3

From: Stefan Tzeggai <tzeggai(at)empirica-systeme(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Segfault 11 on PG10 with max_parallel_workers_per_gather>3
Date: 2017-10-25 12:16:39
Message-ID: 665b0747-6bdd-2da9-2ca3-ee06d0e71379@empirica-systeme.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi

I can reproduce a segfault by executing a query.

I run Postgresql 10.0-1.pgdg16.04+1 on Ubuntu 16.04.3

The machine has hyperthreading enabled and 48 virtual cores: 2xE5-2690v3

I have a materialized view:

refresh materialized view concurrently ;
--works

results.as_20171025_20170930_ut78777;
--works

set max_parallel_workers_per_gather to 0;
SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE
(((oadr_gkz IN
(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000))));
--works: 129587

set max_parallel_workers_per_gather to 3;
SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE
(((oadr_gkz IN
(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000))));
--works: 129587

set max_parallel_workers_per_gather to 4;
SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT WHERE
(((oadr_gkz IN
(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR ((oadr_gkz IN (2000000))));
--SEGFAULT!

set max_parallel_workers_per_gather to 4;
explain SELECT count(1) FROM results.as_20171025_20170930_ut78777 RT
WHERE ((((oart_zwangsversteigerung_janein IS NULL)) AND (oadr_gkz IN
(2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000))
AND (objekttyp_grob IN (1)) AND (startdate>='2012-01-01' OR enddate IS
NULL OR enddate>='2012-01-01')) OR (((oart_zwangsversteigerung_janein IS
NULL)) AND (oadr_gkz IN (2000000))))

"Finalize Aggregate (cost=186411.37..186411.38 rows=1 width=8)"
" -> Gather (cost=186410.95..186411.36 rows=4 width=8)"
" Workers Planned: 4"
" -> Partial Aggregate (cost=185410.95..185410.96 rows=1 width=8)"
" -> Parallel Bitmap Heap Scan on
as_20171025_20170930_ut78777 rt (cost=12058.69..185353.14 rows=23121
width=0)"
" Recheck Cond: (((oadr_gkz = ANY
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
AND (objekttyp_grob = 1)) OR (oadr_gkz = 2000000))"
" Filter: ((oart_zwangsversteigerung_janein IS NULL)
AND (((oadr_gkz = ANY
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))
AND (objekttyp_grob = 1 (...)"
" -> BitmapOr (cost=12058.69..12058.69 rows=94046
width=0)"
" -> BitmapAnd (cost=11726.20..11726.20
rows=76321 width=0)"
" -> Bitmap Index Scan on
as_20171025_20170930_ut78777_oadr_gkz_wnnidx (cost=0.00..3129.41
rows=185997 width=0)"
" Index Cond: (oadr_gkz = ANY
('{2000000,5111000,5314000,5315000,5334002,5515000,6411000,6412000,7315000,8111000,8221000,9162000,9184119,11000000,14612000}'::integer[]))"
" -> Bitmap Index Scan on
as_20171025_20170930_ut78777_objekttyp_grob_idx (cost=0.00..8550.30
rows=491449 width=0)"
" Index Cond: (objekttyp_grob = 1)"
" -> Bitmap Index Scan on
as_20171025_20170930_ut78777_oadr_gkz_wnnidx (cost=0.00..309.37
rows=17726 width=0)"
" Index Cond: (oadr_gkz = 2000000)"

And the postgresql-10.log says:

>2017-10-25 13:45:35.149 CEST [6345] LOG: Serverprozess (PID 25637)
wurde von Signal 11 beendet: Segmentation fault
>2017-10-25 13:45:35.149 CEST [6345] DETAIL: Der fehlgeschlagene
Prozess führte aus:
...
>2017-10-25 13:42:14.332 CEST [25629] LOG: Redo beginnt bei 108/449A9D98
>2017-10-25 13:42:14.396 CEST [25629] LOG: unerwartete Pageaddr
107/6F8CC000 in Logsegment 000000010000010800000045, Offset 9224192
>2017-10-25 13:42:14.396 CEST [25629] LOG: Redo fertig bei 108/458CA968

I upgraded Postgresql using pg_upgrade with hard links a few days ago.
This view has not been upgraded from PG9.6 to 10, but has been created
freshly on PG10 this morning.

Other related settings in postgresql.conf are:
>max_worker_processes = 12
>max_parallel_workers_per_gather = 4
>max_parallel_workers = 12

So what I fugured out is that it only crashed when I increase
max_parallel_workers_per_gather to more than 3.

Probably I missunderstood some of the max_parallel_-Setting and i do
bogus, but the Database should probably not segfault...

How I can I help you with more information?

Steve

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-10-25 14:34:35 Re: Segfault 11 on PG10 with max_parallel_workers_per_gather>3
Previous Message banzaitron 2017-10-24 23:31:49 BUG #14871: RLS join query plan