Quick Links

Re: signal 11 segfaults with parallel workers

From:	Rick Otten <rottenwindfish(at)gmail(dot)com>
To:	PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: signal 11 segfaults with parallel workers
Date:	2017-07-31 01:05:50
Message-ID:	CAMAYy4LwudXQ326o-xZdf2WZiWrA8iu8S6FNPxcPtvPN0b1xRw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Ok, I got a core this time at 23:00 when the database went down.
Here is the basic backtrace:

$ gdb /usr/lib/postgresql/9.6/bin/postgres core
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html
>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/9.6/bin/postgres...Reading symbols
from
/usr/lib/debug/.build-id/32/108810b4ff9528a94d48315dd9333c501fc52d.debug...done.
done.
[New LWP 4294]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: bgworker: parallel worker f'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 MemoryContextAlloc (context=0x0, size=size(at)entry=1024) at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/mmgr/mcxt.c:761
761 /build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/mmgr/mcxt.c:
No such file or directory.
(gdb) bt
#0 MemoryContextAlloc (context=0x0, size=size(at)entry=1024) at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/mmgr/mcxt.c:761
#1 0x0000560b7a518ec4 in SPI_connect () at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/executor/spi.c:102
#2 0x00007fec467b9261 in _PG_init () from
/usr/lib/postgresql/9.6/lib/multicorn.so
#3 0x0000560b7a717cf2 in internal_load_library
(libname=libname(at)entry=0x7ff48208dbf8
<error: Cannot access memory at address 0x7ff48208dbf8>)
at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/fmgr/dfmgr.c:276
#4 0x0000560b7a7188c0 in RestoreLibraryState (start_address=0x7ff48208dbf8
<error: Cannot access memory at address 0x7ff48208dbf8>)
at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/utils/fmgr/dfmgr.c:741
#5 0x0000560b7a3ee4f7 in ParallelWorkerMain (main_arg=<optimized out>)
at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/access/transam/parallel.c:1065
#6 0x0000560b7a59ae29 in StartBackgroundWorker () at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/bgworker.c:742
#7 0x0000560b7a5a701b in do_start_bgworker (rw=<optimized out>)
at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:5579
#8 maybe_start_bgworkers () at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:5776
#9 0x0000560b7a5a7cd5 in sigusr1_handler (postgres_signal_arg=<optimized
out>)
at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:4973
#10 <signal handler called>
#11 0x00007ff480425573 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#12 0x0000560b7a3858ef in ServerLoop () at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:1679
#13 0x0000560b7a5a9053 in PostmasterMain (argc=1, argv=<optimized out>)
at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/postmaster/postmaster.c:1323
#14 0x0000560b7a387511 in main (argc=1, argv=0x560b7ba23630) at
/build/postgresql-9.6-5bnRDZ/postgresql-9.6-9.6.3/build/../src/backend/main/main.c:228
(gdb)

The query that took it down this time (based on the pid reported in the
stacktrace) does indeed spin out a parallel plan, but it is a simple
query. I was surprised to see the multicorn library mentioned in this
trace, it has nothing to do with the multicorn FDW installed on the system.

I've run the query several times in the last few minutes and can't get it
to generate a core again.

On Sun, Jul 30, 2017 at 5:25 PM, Rick Otten <rottenwindfish(at)gmail(dot)com>
wrote:

> Well, I'm not sure how to inspect the temp tablespace other than from the
> filesystem itself. I have it configured on its own disk. Usually the disk
> space ebbs and flows with query activity. Since we've been crashing
> however, it never reclaims the disk that was in use just before the crash.
> So our temp space 'floor" keeps getting higher and higher.
>
> At least that is what it has been doing for the past week or two, and what
> it looked like this morning. Now that the database has been back up for 8
> or 9 hours following this controlled restart, I just went to look at it,
> and all of the temp space has been reclaimed - for the first time since the
> crashing started. ... Interesting...
>
>
> On Sun, Jul 30, 2017 at 11:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Rick Otten <rottenwindfish(at)gmail(dot)com> writes:
>> > One thing that is bugging me is I think when the database crashes, it
>> > doesn't clean up the temp_tablespace(s).
>>
>> Hm, interesting, what do you see in there?
>>
>> regards, tom lane
>>
>
>

In response to

Re: signal 11 segfaults with parallel workers at 2017-07-30 21:25:41 from Rick Otten

Responses

Re: signal 11 segfaults with parallel workers at 2017-07-31 02:47:54 from Amit Kapila
Re: signal 11 segfaults with parallel workers at 2017-07-31 02:56:16 from Andres Freund

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Amit Kapila	2017-07-31 02:47:54	Re: signal 11 segfaults with parallel workers
Previous Message	Amit Langote	2017-07-31 00:58:04	Re: [HACKERS] BUG #14759: insert into foreign data partitions fail