Re: v12.0: segfault in reindex CONCURRENTLY

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: Re: v12.0: segfault in reindex CONCURRENTLY
Date: 2019-10-13 13:03:21
Message-ID: 20191013130321.GD4475@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 13, 2019 at 06:06:43PM +0900, Michael Paquier wrote:
> On Fri, Oct 11, 2019 at 07:44:46PM -0500, Justin Pryzby wrote:
> > Unfortunately, there was no core file, and I'm still trying to reproduce it.
>
> Forgot to set ulimit -c? Having a backtrace would surely help.

Fortunately (?) another server hit crashed last night.
(Doesn't appear to be relevant, but this table has no inheritence/partition-ness).

Looks like it's a race condition and dereferencing *holder=NULL. The first
crash was probably the same bug, due to report query running during "reindex
CONCURRENTLY", and probably finished at nearly the same time as another locker.

Relevant code introduced here:

commit ab0dfc961b6a821f23d9c40c723d11380ce195a6
Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Date: Tue Apr 2 15:18:08 2019 -0300

Report progress of CREATE INDEX operations

Needs to be conditionalized (as anticipated by the comment)

+ if (holder)
pgstat_progress_update_param(PROGRESS_WAITFOR_CURRENT_PID,
holder->pid);

Core was generated by `postgres: postgres ts [local] REINDEX '.
Program terminated with signal 11, Segmentation fault.

#0 WaitForLockersMultiple (locktags=locktags(at)entry=0x1d30548, lockmode=lockmode(at)entry=5, progress=progress(at)entry=true) at lmgr.c:911
#1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid(at)entry=17618, options=options(at)entry=0) at indexcmds.c:3090
#2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352
#3 0x00000000007657fe in standard_ProcessUtility (pstmt=pstmt(at)entry=0x1d05468, queryString=queryString(at)entry=0x1d046e0 "REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename",
context=context(at)entry=PROCESS_UTILITY_TOPLEVEL, params=params(at)entry=0x0, queryEnv=queryEnv(at)entry=0x0, dest=dest(at)entry=0x1d05548,
completionTag=completionTag(at)entry=0x7ffc05e6c0a0 "") at utility.c:787
#4 0x00007f21517204ef in pgss_ProcessUtility (pstmt=0x1d05468, queryString=0x1d046e0 "REINDEX INDEX CONCURRENTLY loaded_cdr_files_filename", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0, dest=0x1d05548, completionTag=0x7ffc05e6c0a0 "") at pg_stat_statements.c:1006
#5 0x0000000000762816 in PortalRunUtility (portal=0x1d7a4e0, pstmt=0x1d05468, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x1d05548,
completionTag=0x7ffc05e6c0a0 "") at pquery.c:1175
#6 0x0000000000763267 in PortalRunMulti (portal=portal(at)entry=0x1d7a4e0, isTopLevel=isTopLevel(at)entry=true, setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=dest(at)entry=0x1d05548,
altdest=altdest(at)entry=0x1d05548, completionTag=completionTag(at)entry=0x7ffc05e6c0a0 "") at pquery.c:1328
#7 0x0000000000763e45 in PortalRun (portal=<optimized out>, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x1d05548, altdest=0x1d05548,
completionTag=0x7ffc05e6c0a0 "") at pquery.c:796
#8 0x000000000075ff45 in exec_simple_query (query_string=<optimized out>) at postgres.c:1215
#9 0x0000000000761212 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:4236
#10 0x0000000000483d02 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4431
#11 BackendStartup (port=0x1d2b340) at postmaster.c:4122
#12 ServerLoop () at postmaster.c:1704
#13 0x00000000006f0b1f in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x1cff280) at postmaster.c:1377
#14 0x0000000000484c93 in main (argc=3, argv=0x1cff280) at main.c:228

bt f

#0 WaitForLockersMultiple (locktags=locktags(at)entry=0x1d30548, lockmode=lockmode(at)entry=5, progress=progress(at)entry=true) at lmgr.c:911
holder = 0x0
lockholders = 0x1d9b778
holders = <optimized out>
lc = 0x1d9bf80
total = <optimized out>
done = 1
#1 0x00000000005c2ac8 in ReindexRelationConcurrently (relationOid=relationOid(at)entry=17618, options=options(at)entry=0) at indexcmds.c:3090
heapRelationIds = 0x1d30360
indexIds = 0x1d303b0
newIndexIds = <optimized out>
relationLocks = <optimized out>
lockTags = <optimized out>
lc = 0x0
lc2 = 0x0
private_context = <optimized out>
oldcontext = <optimized out>
relkind = 105 'i'
relationName = 0x0
relationNamespace = 0x0
ru0 = {tv = {tv_sec = 30592544, tv_usec = 7232025}, ru = {ru_utime = {tv_sec = 281483566645394, tv_usec = 75668733820930}, ru_stime = {tv_sec = 0, tv_usec = 30592272}, {
ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 105, __ru_idrss_word = 105}, {ru_isrss = -926385342574214656,
__ru_isrss_word = -926385342574214656}, {ru_minflt = 8924839, __ru_minflt_word = 8924839}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 17618,
__ru_nswap_word = 17618}, {ru_inblock = 139781327898864, __ru_inblock_word = 139781327898864}, {ru_oublock = 30430312, __ru_oublock_word = 30430312}, {
ru_msgsnd = 139781327898864, __ru_msgsnd_word = 139781327898864}, {ru_msgrcv = 139781327898864, __ru_msgrcv_word = 139781327898864}, {ru_nsignals = 139781327898864,
__ru_nsignals_word = 139781327898864}, {ru_nvcsw = 139781327898864, __ru_nvcsw_word = 139781327898864}, {ru_nivcsw = 139781327898864,
__ru_nivcsw_word = 139781327898864}}}
__func__ = "ReindexRelationConcurrently"
#2 0x00000000005c328a in ReindexIndex (indexRelation=<optimized out>, options=0, concurrent=<optimized out>) at indexcmds.c:2352
state = {concurrent = true, locked_table_oid = 17608}
indOid = 17618
irel = <optimized out>
persistence = 112 'p'
...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-10-13 13:31:54 Re: stress test for parallel workers
Previous Message Amit Kapila 2019-10-13 11:46:25 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions