Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: REINDEX INDEX results in a crash for an index of pg_class since 9.6
Date: 2019-04-30 16:40:46
Message-ID: 5203.1556642446@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I haven't been able to reproduce this locally yet, but my guess is that
> the REINDEX wants to update some row that was already updated by the
> concurrent transaction, so it has to wait to see if the latter commits
> or not. And, of course, waiting while holding AccessExclusiveLock on
> any index of pg_class is a Bad Idea (TM). But I can't quite see why
> we'd be doing something like that during the reindex ...

Ah-hah: the secret to making it reproducible is what prion is doing:
-DRELCACHE_FORCE_RELEASE -DCATCACHE_FORCE_RELEASE

Here's a stack trace from reindex's side:

#0 0x00000033968e9223 in __epoll_wait_nocancel ()
at ../sysdeps/unix/syscall-template.S:82
#1 0x0000000000787cb5 in WaitEventSetWaitBlock (set=0x22d52f0, timeout=-1,
occurred_events=0x7ffc77117c00, nevents=1,
wait_event_info=<value optimized out>) at latch.c:1080
#2 WaitEventSetWait (set=0x22d52f0, timeout=-1,
occurred_events=0x7ffc77117c00, nevents=1,
wait_event_info=<value optimized out>) at latch.c:1032
#3 0x00000000007886da in WaitLatchOrSocket (latch=0x7f90679077f4,
wakeEvents=<value optimized out>, sock=-1, timeout=-1,
wait_event_info=50331652) at latch.c:407
#4 0x000000000079993d in ProcSleep (locallock=<value optimized out>,
lockMethodTable=<value optimized out>) at proc.c:1290
#5 0x0000000000796ba2 in WaitOnLock (locallock=0x2200600, owner=0x2213470)
at lock.c:1768
#6 0x0000000000798719 in LockAcquireExtended (locktag=0x7ffc77117f90,
lockmode=<value optimized out>, sessionLock=<value optimized out>,
dontWait=false, reportMemoryError=true, locallockp=0x0) at lock.c:1050
#7 0x00000000007939b7 in XactLockTableWait (xid=2874,
rel=<value optimized out>, ctid=<value optimized out>,
oper=XLTW_InsertIndexUnique) at lmgr.c:658
#8 0x00000000004d4841 in heapam_index_build_range_scan (
heapRelation=0x7f905eb3fcd8, indexRelation=0x7f905eb3c5b8,
indexInfo=0x22d50c0, allow_sync=<value optimized out>, anyvisible=false,
progress=true, start_blockno=0, numblocks=4294967295,
callback=0x4f8330 <_bt_build_callback>, callback_state=0x7ffc771184f0,
scan=0x2446fb0) at heapam_handler.c:1527
#9 0x00000000004f9db0 in table_index_build_scan (heap=0x7f905eb3fcd8,
index=0x7f905eb3c5b8, indexInfo=0x22d50c0)
at ../../../../src/include/access/tableam.h:1437
#10 _bt_spools_heapscan (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8,
indexInfo=0x22d50c0) at nbtsort.c:489
#11 btbuild (heap=0x7f905eb3fcd8, index=0x7f905eb3c5b8, indexInfo=0x22d50c0)
at nbtsort.c:337
#12 0x0000000000547e33 in index_build (heapRelation=0x7f905eb3fcd8,
indexRelation=0x7f905eb3c5b8, indexInfo=0x22d50c0, isreindex=true,
parallel=<value optimized out>) at index.c:2724
#13 0x0000000000548b97 in reindex_index (indexId=2662,
skip_constraint_checks=false, persistence=112 'p', options=0)
at index.c:3349
#14 0x00000000005490f1 in reindex_relation (relid=<value optimized out>,
flags=5, options=0) at index.c:3592
#15 0x00000000005ed295 in ReindexTable (relation=0x21e2938, options=0,
concurrent=<value optimized out>) at indexcmds.c:2422
#16 0x00000000007b5f69 in standard_ProcessUtility (pstmt=0x21e2cf0,
queryString=0x21e1f18 "REINDEX TABLE pg_class;",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x21e2de8, completionTag=0x7ffc77118d80 "") at utility.c:790
#17 0x00000000007b1689 in PortalRunUtility (portal=0x2247c38, pstmt=0x21e2cf0,
isTopLevel=<value optimized out>, setHoldSnapshot=<value optimized out>,
dest=0x21e2de8, completionTag=<value optimized out>) at pquery.c:1175
#18 0x00000000007b2611 in PortalRunMulti (portal=0x2247c38, isTopLevel=true,
setHoldSnapshot=false, dest=0x21e2de8, altdest=0x21e2de8,
completionTag=0x7ffc77118d80 "") at pquery.c:1328
#19 0x00000000007b2eb0 in PortalRun (portal=0x2247c38,
count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x21e2de8,
altdest=0x21e2de8, completionTag=0x7ffc77118d80 "") at pquery.c:796
#20 0x00000000007af2ab in exec_simple_query (
query_string=0x21e1f18 "REINDEX TABLE pg_class;") at postgres.c:1215

So basically, the problem here lies in trying to re-verify uniqueness
of pg_class's indexes --- there could easily be entries in pg_class that
haven't committed yet.

I don't think there's an easy way to make this not deadlock against
concurrent DDL. For sure I don't want to disable the uniqueness
checks.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Joseph Krogh 2019-04-30 16:44:19 Re: ERROR: failed to add item to the index page
Previous Message Andrew Dunstan 2019-04-30 16:34:31 Re: Sv: Sv: Re: Sv: Re: ERROR: failed to add item to the index page