Re: BUG #14180: Segmentation fault on replication slave

From: Bo Ørsted Andresen <boa(at)neogrid(dot)dk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14180: Segmentation fault on replication slave
Date: 2016-06-08 11:46:11
Message-ID: VI1PR04MB1488D19F84ADF821932A0218CB5E0@VI1PR04MB1488.eurprd04.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> > On 2016-06-07 20:08, Tom Lane wrote:
> > I think the reason for the lack of useful backtrace info is that we've
> > smashed the stack. Note that the original report shows i == 3324
> > which is much larger than the available length of the local items[] array
> (408).
> > So presumably, the passed-in "len" was bogus (much too large).
> >
> > If you're prepared to build a custom version of Postgres, you could
> > try adding this to _bt_restore_page():
> >
> > /* Need to copy tuple header due to alignment
> considerations */
> > memcpy(&itupdata, from, sizeof(IndexTupleData));
> > itemsz = IndexTupleDSize(itupdata);
> > itemsz = MAXALIGN(itemsz);
> >
> > + if (i >= lengthof(items))
> > + elog(PANIC, "too many items on btree page");
> > +
> > items[i] = (Item) from;
> > itemsizes[i] = itemsz;
> > i++;
> >
> > from += itemsz;
> >
> > and then you should get a core dump before the stack is clobbered.
> >
> > I wonder whether we shouldn't add such a check to the regular sources...

Logged:

LOG: started streaming WAL from primary at 631/7000000 on timeline 1
PANIC: too many items on btree page
CONTEXT: xlog redo Btree/SPLIT_R: level 0, firstright 139

Bacttrace:

# gdb -p 10069
GNU gdb (Ubuntu 7.11-0ubuntu1) 7.11
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 10069
Reading symbols from /usr/local/pgsql/bin/postgres...done.
Reading symbols from /lib/x86_64-linux-gnu/librt.so.1...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/librt-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libdl.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libdl-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libm.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libm-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libpthread.so.0...Reading symbols from /usr/lib/debug/.build-id/b7/7847cc9cacbca3b5753d0d25a32e5795afe75b.debug...done.
done.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.23.so...done.
done.
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.23.so...done.
done.
0x00007ffff73f3e70 in __poll_nocancel () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) set pagination off
(gdb) set logging file /tmp/debuglog-20160608-2.txt
(gdb) set logging on
Copying output to /tmp/debuglog-20160608-2.txt.
(gdb) handle SIGUSR1 nostop
Signal Stop Print Pass to program Description
SIGUSR1 No Yes Yes User defined signal 1
(gdb) handle SIGUSR1 noprint
Signal Stop Print Pass to program Description
SIGUSR1 No No Yes User defined signal 1
(gdb) cont
Continuing.
(gdb)
Program received signal SIGABRT, Aborted.
0x00007ffff732e418 in __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff732e418 in __GI_raise (sig=sig(at)entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007ffff733001a in __GI_abort () at abort.c:89
#2 0x000000000078ccaa in errfinish (dummy=dummy(at)entry=0) at elog.c:551
#3 0x000000000079074a in elog_finish (elevel=elevel(at)entry=22, fmt=fmt(at)entry=0x7cb187 "too many items on btree page") at elog.c:1368
#4 0x00000000004ae437 in _bt_restore_page (page=page(at)entry=0x7fffefa2cb40 "", from=<optimized out>, from(at)entry=0xc52e70 "\036", len=<optimized out>) at nbtxlog.c:58
#5 0x00000000004ae8a4 in btree_xlog_split (onleft=onleft(at)entry=0 '\000', isroot=isroot(at)entry=0 '\000', record=record(at)entry=0xc3b840) at nbtxlog.c:241
#6 0x00000000004aee1c in btree_redo (record=0xc3b840) at nbtxlog.c:984
#7 0x00000000004d5c2b in StartupXLOG () at xlog.c:6825
#8 0x000000000064e212 in StartupProcessMain () at startup.c:215
#9 0x00000000004e3168 in AuxiliaryProcessMain (argc=argc(at)entry=2, argv=argv(at)entry=0x7fffffffe3e0) at bootstrap.c:418
#10 0x000000000064b698 in StartChildProcess (type=StartupProcess) at postmaster.c:5199
#11 0x000000000064dc84 in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0xc1b9f0) at postmaster.c:1284
#12 0x0000000000467950 in main (argc=3, argv=0xc1b9f0) at main.c:228

Regards,
Bo Ørsted Andresen

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2016-06-08 14:04:51 Re: [BUGS] Routine analyze of single column prevents standard autoanalyze from running at all
Previous Message Emiel Hermsen 2016-06-08 11:31:42 Re: Case in Order By Ignored without warning or error