Re: Segmentation Fault in logical decoding get/peek API

From: Jeremy Finzel <finzelj(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Segmentation Fault in logical decoding get/peek API
Date: 2019-02-18 20:06:50
Message-ID: CAMa1XUjoL_DhD4AWz+MRZYYqjXt=vNcUhuy=JPK2LKS3boVqGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

>
> Well, as Peter said, "git bisect" and trying to reproduce the problem
> at each step would be the way to prove it definitively. Seems mighty
> tedious though. Possibly you could shave some time off the process
> by assuming it must have been one of the commits that touched
> reorderbuffer.c ... a quick check says there have been ten of those
> in the v10 branch since 10.3.
>

Update:

- I definitely got the same segfault on a commit after 10.4 - 0bb28ca
- I am now getting a different segfault on 10.5 - but I need another set
of eyes to verify I am not compiling it wrong

After decoding successfully for awhile, now I get an immediate segfault
upon peek_changes. First of all, here is the backtrace:

$ sudo -u postgres gdb -q -c /san/<cluster>/pgdata/core
/usr/lib/postgresql/10.5/bin/postgres
Reading symbols from /usr/lib/postgresql/10.5/bin/postgres...done.
[New LWP 22699]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: <cluster>: jfinzel foo_db
10.7.111.37(52316) FETCH'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007eff42d54428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0 0x00007eff42d54428 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x00007eff42d5602a in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x0000000000a45f9c in ExceptionalCondition (conditionName=0xc2d688
"!(prev_first_lsn < cur_txn->first_lsn)", errorType=0xc2d404
"FailedAssertion", fileName=0xc2d478 "reorderbuffer.c", lineNumber=688) at
assert.c:54
#3 0x000000000084b0ac in AssertTXNLsnOrder (rb=0x28ed790) at
reorderbuffer.c:688
#4 0x000000000084ab97 in ReorderBufferTXNByXid (rb=0x28ed790,
xid=319299822, create=1 '\001', is_new=0x0, lsn=9888781386112,
create_as_top=1 '\001') at reorderbuffer.c:567
#5 0x000000000084d86c in ReorderBufferAddNewTupleCids (rb=0x28ed790,
xid=319299822, lsn=9888781386112, node=..., tid=..., cmin=2,
cmax=4294967295, combocid=4294967295) at reorderbuffer.c:2053
#6 0x00000000008522b6 in SnapBuildProcessNewCid (builder=0x28f57c0,
xid=319299827, lsn=9888781386112, xlrec=0x2821c08) at snapbuild.c:780
#7 0x000000000083f280 in DecodeHeap2Op (ctx=0x28dd720, buf=0x7ffc5b73e2d0)
at decode.c:371
#8 0x000000000083ebb1 in LogicalDecodingProcessRecord (ctx=0x28dd720,
record=0x28dd9e0) at decode.c:121
#9 0x0000000000844f86 in pg_logical_slot_get_changes_guts
(fcinfo=0x7ffc5b73e600, confirm=0 '\000', binary=0 '\000') at
logicalfuncs.c:308
#10 0x000000000084514d in pg_logical_slot_peek_changes
(fcinfo=0x7ffc5b73e600) at logicalfuncs.c:381
#11 0x00000000006f7973 in ExecMakeTableFunctionResult (setexpr=0x28265b8,
econtext=0x28262b0, argContext=0x28b4af0, expectedDesc=0x28d1d20,
randomAccess=4 '\004') at execSRF.c:231
#12 0x000000000070a870 in FunctionNext (node=0x2826198) at
nodeFunctionscan.c:94
#13 0x00000000006f6f6e in ExecScanFetch (node=0x2826198, accessMtd=0x70a7b9
<FunctionNext>, recheckMtd=0x70aba1 <FunctionRecheck>) at execScan.c:97
#14 0x00000000006f6fdd in ExecScan (node=0x2826198, accessMtd=0x70a7b9
<FunctionNext>, recheckMtd=0x70aba1 <FunctionRecheck>) at execScan.c:147
#15 0x000000000070abef in ExecFunctionScan (pstate=0x2826198) at
nodeFunctionscan.c:270
#16 0x00000000006f541a in ExecProcNodeFirst (node=0x2826198) at
execProcnode.c:430
#17 0x00000000006ed5af in ExecProcNode (node=0x2826198) at
../../../src/include/executor/executor.h:250
#18 0x00000000006effaf in ExecutePlan (estate=0x2825f80,
planstate=0x2826198, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001', numberTuples=2000, direction=ForwardScanDirection,
dest=0x27ffc78, execute_once=0 '\000') at execMain.c:1722
#19 0x00000000006edbc4 in standard_ExecutorRun (queryDesc=0x2825130,
direction=ForwardScanDirection, count=2000, execute_once=0 '\000') at
execMain.c:363
#20 0x00000000006ed9de in ExecutorRun (queryDesc=0x2825130,
direction=ForwardScanDirection, count=2000, execute_once=0 '\000') at
execMain.c:306
#21 0x00000000008d0dd7 in PortalRunSelect (portal=0x27f70a8, forward=1
'\001', count=2000, dest=0x27ffc78) at pquery.c:932
#22 0x00000000008d200f in DoPortalRunFetch (portal=0x27f70a8,
fdirection=FETCH_FORWARD, count=2000, dest=0x27ffc78) at pquery.c:1675
#23 0x00000000008d19df in PortalRunFetch (portal=0x27f70a8,
fdirection=FETCH_FORWARD, count=2000, dest=0x27ffc78) at pquery.c:1434
#24 0x00000000006833bb in PerformPortalFetch (stmt=0x2888570,
dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at portalcmds.c:199
#25 0x00000000008d2ab6 in standard_ProcessUtility (pstmt=0x28888d0,
queryString=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x27ffc78,
completionTag=0x7ffc5b73f0f0 "") at utility.c:527
#26 0x00007eff42829eb6 in pglogical_ProcessUtility (pstmt=0x28888d0,
queryString=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x27ffc78,
completionTag=0x7ffc5b73f0f0 "") at pglogical_executor.c:279
#27 0x00000000008d2547 in ProcessUtility (pstmt=0x28888d0,
queryString=0x2887b30 "FETCH FORWARD 2000 FROM crash_dude;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x27ffc78,
completionTag=0x7ffc5b73f0f0 "") at utility.c:353
#28 0x00000000008d141b in PortalRunUtility (portal=0x27f6f90,
pstmt=0x28888d0, isTopLevel=1 '\001', setHoldSnapshot=1 '\001',
dest=0x27ffc78, completionTag=0x7ffc5b73f0f0 "") at pquery.c:1178
#29 0x00000000008d1119 in FillPortalStore (portal=0x27f6f90, isTopLevel=1
'\001') at pquery.c:1038
#30 0x00000000008d09a1 in PortalRun (portal=0x27f6f90,
count=9223372036854775807, isTopLevel=1 '\001', run_once=1 '\001',
dest=0x28889c8, altdest=0x28889c8, completionTag=0x7ffc5b73f350 "") at
pquery.c:768
#31 0x00000000008c9f67 in exec_simple_query (query_string=0x2887b30 "FETCH
FORWARD 2000 FROM crash_dude;") at postgres.c:1099
#32 0x00000000008cea3c in PostgresMain (argc=1, argv=0x2804e50,
dbname=0x2804e28 "<foo_db>", username=0x2804e08 "jfinzel") at
postgres.c:4088
#33 0x000000000082369b in BackendRun (port=0x2801170) at postmaster.c:4405
#34 0x0000000000822d02 in BackendStartup (port=0x2801170) at
postmaster.c:4077
#35 0x000000000081ee31 in ServerLoop () at postmaster.c:1755
#36 0x000000000081e2d9 in PostmasterMain (argc=3, argv=0x27d79a0) at
postmaster.c:1363
#37 0x0000000000751669 in main (argc=3, argv=0x27d79a0) at main.c:228

Here is my compile script that I used to compile 10.5 (at commit
4191e37a9a1fb598267c445c717914012d9bc423) and run. The cluster with said
issue uses extensions compiled below as well:

$ cat make_postgres
#!/bin/bash

set -eu

dirname=$1

instdir=/usr/lib/postgresql/$dirname

# Install Postgres
export
PATH=$instdir/bin:/home/jfinzel/bin:/home/jfinzel/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
sudo mkdir $instdir

# This is my directory with source code from git
commit 4191e37a9a1fb598267c445c717914012d9bc423
cd ~/postgres_source/postgres
./configure --prefix=$instdir --enable-cassert --enable-debug CFLAGS="-ggdb
-g3 -fno-omit-frame-pointer -fPIC"
make
sudo "PATH=$PATH" make install

# Contrib
cd contrib/btree_gist/
sudo "PATH=$PATH" make install
cd ../test_decoding/
sudo "PATH=$PATH" make install

# Install Pglogical
cd /usr/src/pglogical-2.2.1
sudo "PATH=$PATH" make clean
sudo "PATH=$PATH" make install

# Install Extensions
cd $HOME/pgl_ddl_deploy
make clean
sudo "PATH=$PATH" make install
cd $HOME/pglogical_ticker
make clean
sudo "PATH=$PATH" make install
cd $HOME/pg_fact_loader
make clean
sudo "PATH=$PATH" make install

$ ./make_postgres 10.5

Thanks!
Jeremy

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2019-02-18 20:15:47 Re: Segmentation Fault in logical decoding get/peek API
Previous Message PG Bug reporting form 2019-02-18 17:56:55 BUG #15641: Autoprewarm worker fails to start on Windows with huge pages in use