Re: BUG #16162: create index using gist_trgm_ops leads to panic

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: kunert(at)cms(dot)hu-berlin(dot)de, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16162: create index using gist_trgm_ops leads to panic
Date: 2019-12-12 12:50:23
Message-ID: 20191212125023.vpk6zbbeldzmyerc@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Dec 12, 2019 at 10:22:21AM +0000, PG Bug reporting form wrote:
>The following bug has been logged on the website:
>
>Bug reference: 16162
>Logged by: Andreas Kunert
>Email address: kunert(at)cms(dot)hu-berlin(dot)de
>PostgreSQL version: 12.1
>Operating system: Ubuntu 16.04
>Description:
>
>Hello,
>
>while testing an upcoming migration of our database server to a new hardware
>including an update to a new postgres version I observed the following
>behavior on the new server running a self-compiled Postgres 12.1 under
>Ubuntu 16.04:
>
>When importing a complete pg_dump, about one in three times a panic occurs
>while trying to create a pg_trgm gist index and then the cluster restarts.
>The according excerpt from the log:
>
>2019-12-12 10:32:03.886 CET [36999] LOG: statement: CREATE INDEX gist_idx
>ON public.mytable USING gist (am_description public.gist_trgm_ops);
>2019-12-12 10:32:04.370 CET [36999] PANIC: invalid index offnum: 4
>2019-12-12 10:32:04.370 CET [36999] STATEMENT: CREATE INDEX gist_idx ON
>public.mytable USING gist (am_description public.gist_trgm_ops);
>2019-12-12 10:32:04.914 CET [31336] LOG: server process (PID 36999) was
>terminated by signal 6: Aborted
>2019-12-12 10:32:04.914 CET [31336] DETAIL: Failed process was running:
>CREATE INDEX gist_idx ON public.mytable USING gist (am_description
>public.gist_trgm_ops);
>2019-12-12 10:32:04.914 CET [31336] LOG: terminating any other active
>server processes
>
>I could successfully recreate the behaviour on another server (running a
>default-configured packaged version of postgres 12 for ubuntu 16.04 on a
>german locale) - the log look like this:
>
>2019-12-11 17:15:00.735 CET [14959] test_gist_index(at)test_gist_index PANIK:
>invalid index offnum: 5
>2019-12-11 17:15:00.735 CET [14959] test_gist_index(at)test_gist_index
>ANWEISUNG: CREATE INDEX gist_idx ON public.mytable USING
>gist(am_description public.gist_trgm_ops);
>2019-12-11 17:15:07.501 CET [7436] LOG: Serverprozess (PID 14959) wurde von
>Signal 6 beendet: Abgebrochen
>2019-12-11 17:15:07.501 CET [7436] DETAIL: Der fehlgeschlagene Prozess
>führte aus: CREATE INDEX gist_idx ON public.mytable USING
>gist(am_description public.gist_trgm_ops);
>2019-12-11 17:15:07.501 CET [7436] LOG: aktive Serverprozesse werden
>abgebrochen
>2019-12-11 17:15:07.515 CET [7436] LOG: alle Serverprozesse beendet;
>initialisiere neu
>
>
>Unfortunately the database contains sensitive data but after a couple of
>days and endless tests I managed to create a similarly built database
>containing only random data which provokes the same behaviour (the log
>messages above came from my test database). It consists of only on table
>with two columns (serial, text) containing 200000 lines filled with random
>strings of lengths varying from 1 to 1000.
>
>A perl script creating the database, creating and filling the table and
>finally iteratively creating and dropping the index is at the end of this
>mail.
>

Thanks for the report. The reproducer perl script is very useful, I've
managed to reproduce the crash pretty fast. With asserts enabled, the
backtrace from master looks like this:

Core was generated by `postgres: user test ::1(44878) CREATE INDEX '.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007dc91a786e35 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: dnf debuginfo-install glibc-2.29-27.fc30.x86_64 libgcc-9.2.1-1.fc30.x86_64
(gdb) bt
#0 0x00007dc91a786e35 in raise () from /lib64/libc.so.6
#1 0x00007dc91a771895 in abort () from /lib64/libc.so.6
#2 0x0000000000a8b375 in ExceptionalCondition (conditionName=0xb08e78 "ItemIdHasStorage(((ItemId) (&((PageHeader) (page))->pd_linp[(todelete) - 1])))", errorType=0xb08d2a "FailedAssertion", fileName=0xb08e21 "gistutil.c", lineNumber=69) at assert.c:67
#3 0x00000000004c6685 in gistnospace (page=0x7dc919d73d00 "", itvec=0x7ffcbf3cacb0, len=2, todelete=8, freespace=819) at gistutil.c:69
#4 0x00000000004b64b5 in gistplacetopage (rel=0x7dc911705a70, freespace=819, giststate=0x1dc9400, buffer=16377, itup=0x7ffcbf3cacb0, ntup=2, oldoffnum=8, newblkno=0x0, leftchildbuf=16299, splitinfo=0x7ffcbf3cac40, markfollowright=true, heapRel=0x7dc9116f6970, is_build=true) at gist.c:256
#5 0x00000000004b9868 in gistinserttuples (state=0x7ffcbf3cadd0, stack=0x7ffcbf3cae00, giststate=0x1dc9400, tuples=0x7ffcbf3cacb0, ntup=2, oldoffnum=8, leftchild=16299, rightchild=15979, unlockbuf=true, unlockleftchild=false) at gist.c:1266
#6 0x00000000004b9af0 in gistfinishsplit (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, splitinfo=0x1df4568, unlockbuf=false) at gist.c:1363
#7 0x00000000004b9922 in gistinserttuples (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, tuples=0x7ffcbf3cad70, ntup=1, oldoffnum=0, leftchild=0, rightchild=0, unlockbuf=false, unlockleftchild=false) at gist.c:1293
#8 0x00000000004b97b3 in gistinserttuple (state=0x7ffcbf3cadd0, stack=0x1def6a8, giststate=0x1dc9400, tuple=0x1ded6f8, oldoffnum=0) at gist.c:1219
#9 0x00000000004b8531 in gistdoinsert (r=0x7dc911705a70, itup=0x1ded6f8, freespace=819, giststate=0x1dc9400, heapRel=0x7dc9116f6970, is_build=true) at gist.c:875
#10 0x00000000004bb3a9 in gistBuildCallback (index=0x7dc911705a70, tid=0x1de09bc, values=0x7ffcbf3cb1a0, isnull=0x7ffcbf3cb180, tupleIsAlive=true, state=0x7ffcbf3cb3f0) at gistbuild.c:470
#11 0x00000000004f71b7 in heapam_index_build_range_scan (heapRelation=0x7dc9116f6970, indexRelation=0x7dc911705a70, indexInfo=0x1ddfac0, allow_sync=true, anyvisible=false, progress=true, start_blockno=0, numblocks=4294967295, callback=0x4bb2da <gistBuildCallback>, callback_state=0x7ffcbf3cb3f0, scan=0x1de0968)
at heapam_handler.c:1659
#12 0x00000000004bab42 in table_index_build_scan (table_rel=0x7dc9116f6970, index_rel=0x7dc911705a70, index_info=0x1ddfac0, allow_sync=true, progress=true, callback=0x4bb2da <gistBuildCallback>, callback_state=0x7ffcbf3cb3f0, scan=0x0) at ../../../../src/include/access/tableam.h:1508
#13 0x00000000004bae2a in gistbuild (heap=0x7dc9116f6970, index=0x7dc911705a70, indexInfo=0x1ddfac0) at gistbuild.c:196
#14 0x00000000005a0019 in index_build (heapRelation=0x7dc9116f6970, indexRelation=0x7dc911705a70, indexInfo=0x1ddfac0, isreindex=false, parallel=true) at index.c:2871
#15 0x000000000059d510 in index_create (heapRelation=0x7dc9116f6970, indexRelationName=0x1d8a670 "gist_idx", indexRelationId=16512, parentIndexRelid=0, parentConstraintId=0, relFileNode=0, indexInfo=0x1ddfac0, indexColNames=0x1dbbef8, accessMethodObjectId=783, tableSpaceId=0, collationObjectId=0x1dbbf70,
classObjectId=0x1dbbf90, coloptions=0x1dbbfb0, reloptions=0, flags=0, constr_flags=0, allow_system_table_mods=false, is_internal=false, constraintId=0x7ffcbf3cb734) at index.c:1221
#16 0x0000000000680605 in DefineIndex (relationId=16485, stmt=0x1d8a5d8, indexRelationId=0, parentIndexId=0, parentConstraintId=0, is_alter_table=false, check_rights=true, check_not_in_use=true, skip_build=false, quiet=false) at indexcmds.c:987
#17 0x000000000091679b in ProcessUtilitySlow (pstate=0x1d8a4c0, pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "")
at utility.c:1368
#18 0x000000000091579c in standard_ProcessUtility (pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:922
#19 0x0000000000914894 in ProcessUtility (pstmt=0x1cf9678, queryString=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops);", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at utility.c:359
#20 0x000000000091377b in PortalRunUtility (portal=0x1d603e0, pstmt=0x1cf9678, isTopLevel=true, setHoldSnapshot=false, dest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at pquery.c:1175
#21 0x0000000000913997 in PortalRunMulti (portal=0x1d603e0, isTopLevel=true, setHoldSnapshot=false, dest=0x1cf9958, altdest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at pquery.c:1321
#22 0x0000000000912ec1 in PortalRun (portal=0x1d603e0, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1cf9958, altdest=0x1cf9958, completionTag=0x7ffcbf3cc040 "") at pquery.c:796
#23 0x000000000090cbf9 in exec_simple_query (query_string=0x1cf87d0 "CREATE INDEX gist_idx ON public.mytable USING gist (am_description public.gist_trgm_ops);") at postgres.c:1226
#24 0x0000000000910ff4 in PostgresMain (argc=1, argv=0x1d243f0, dbname=0x1d243d0 "test", username=0x1d243b0 "user") at postgres.c:4306
#25 0x0000000000864a19 in BackendRun (port=0x1d1d310) at postmaster.c:4498
#26 0x00000000008641ea in BackendStartup (port=0x1d1d310) at postmaster.c:4189
#27 0x0000000000860661 in ServerLoop () at postmaster.c:1727
#28 0x000000000085ff0d in PostmasterMain (argc=3, argv=0x1cf23b0) at postmaster.c:1400
#29 0x000000000077037e in main (argc=3, argv=0x1cf23b0) at main.c:210

This only happens on some runs (in my case the CREATE INDEX passed 8x
without any issue and failed on 9th iteration), so there has to be some
random element affecting this.

FWIW the first place where the offnum=8 happens to be used is this call
in gistfinishsplit() - it's the stack->downlinkoffnum

gistinserttuples(state, stack->parent, giststate,
tuples, 2,
stack->downlinkoffnum,
left->buf, right->buf,
true, /* Unlock parent */
unlockbuf /* Unlock stack->buffer if caller wants that */
);

so maybe the GIST stack is stale, not updated, or something like that. I
think I've seen an issue mentioning that recently ... yep, found it [1].

And I see Tom came to about the same conclusion, pointing to the same
place in gistfinishsplit. It's only a couple days old, I don't think
I've seen any proposed fixes yet.

[1] https://www.postgresql.org/message-id/flat/16134-0423f729671dec64%40postgresql.org

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2019-12-12 12:52:29 Re: BUG #16153: foreign key update should probably move dependent rows in the case of tuple rerouting
Previous Message PG Bug reporting form 2019-12-12 10:22:21 BUG #16162: create index using gist_trgm_ops leads to panic