pg15b3: crash in paralell vacuum

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Amit Kapila <akapila(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: pg15b3: crash in paralell vacuum
Date: 2022-08-18 13:34:06
Message-ID: 20220818133406.GL26426@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Immediately after upgrading an internal instance, a loop around "vacuum" did
this:

TRAP: FailedAssertion("indstats->status == PARALLEL_INDVAC_STATUS_INITIAL", File: "vacuumparallel.c", Line: 611, PID: 27635)
postgres: postgres pryzbyj [local] VACUUM(ExceptionalCondition+0x8d)[0x99d9fd]
postgres: postgres pryzbyj [local] VACUUM[0x6915db]
postgres: postgres pryzbyj [local] VACUUM(heap_vacuum_rel+0x12b6)[0x5083e6]
postgres: postgres pryzbyj [local] VACUUM[0x68e97a]
postgres: postgres pryzbyj [local] VACUUM(vacuum+0x48e)[0x68fe9e]
postgres: postgres pryzbyj [local] VACUUM(ExecVacuum+0x2ae)[0x69065e]
postgres: postgres pryzbyj [local] VACUUM(standard_ProcessUtility+0x530)[0x8567b0]
/usr/pgsql-15/lib/pg_stat_statements.so(+0x5450)[0x7f52b891c450]
postgres: postgres pryzbyj [local] VACUUM[0x85490a]
postgres: postgres pryzbyj [local] VACUUM[0x854a53]
postgres: postgres pryzbyj [local] VACUUM(PortalRun+0x179)[0x855029]
postgres: postgres pryzbyj [local] VACUUM[0x85099b]
postgres: postgres pryzbyj [local] VACUUM(PostgresMain+0x199a)[0x85268a]
postgres: postgres pryzbyj [local] VACUUM[0x496a21]
postgres: postgres pryzbyj [local] VACUUM(PostmasterMain+0x11c0)[0x7b3980]
postgres: postgres pryzbyj [local] VACUUM(main+0x1c6)[0x4986a6]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f52c4b893d5]
postgres: postgres pryzbyj [local] VACUUM[0x498c59]
< 2022-08-18 07:56:51.963 CDT >LOG: server process (PID 27635) was terminated by signal 6: Aborted
< 2022-08-18 07:56:51.963 CDT >DETAIL: Failed process was running: VACUUM ANALYZE alarms

Unfortunately, it looks like the RPM packages are compiled with -O2, so this is
of limited use.

Core was generated by `postgres: postgres pryzbyj [local] VACUUM '.
Program terminated with signal 6, Aborted.
#0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install audit-libs-2.8.4-4.el7.x86_64 bzip2-libs-1.0.6-13.el7.x86_64 cyrus-sasl-lib-2.1.26-23.el7.x86_64 elfutils-libelf-0.176-5.el7.x86_64 elfutils-libs-0.176-5.el7.x86_64 glibc-2.17-260.el7_6.3.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-51.el7_9.x86_64 libattr-2.4.46-13.el7.x86_64 libcap-2.22-9.el7.x86_64 libcap-ng-0.7.5-4.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libgcc-4.8.5-39.el7.x86_64 libgcrypt-1.5.3-14.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.1.2-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 libstdc++-4.8.5-39.el7.x86_64 libxml2-2.9.1-6.el7_9.6.x86_64 libzstd-1.5.2-1.el7.x86_64 lz4-1.7.5-2.el7.x86_64 nspr-4.19.0-1.el7_5.x86_64 nss-3.36.0-7.1.el7_6.x86_64 nss-softokn-freebl-3.36.0-5.el7_5.x86_64 nss-util-3.36.0-1.1.el7_6.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-22.el7_9.x86_64 pam-1.1.8-22.el7.x86_64 pcre-8.32-17.el7.x86_64 systemd-libs-219-62.el7_6.5.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f52c4b9d207 in raise () from /lib64/libc.so.6
#1 0x00007f52c4b9e8f8 in abort () from /lib64/libc.so.6
#2 0x000000000099da1e in ExceptionalCondition (conditionName=conditionName(at)entry=0xafae40 "indstats->status == PARALLEL_INDVAC_STATUS_INITIAL", errorType=errorType(at)entry=0x9fb4b7 "FailedAssertion",
fileName=fileName(at)entry=0xafb0c0 "vacuumparallel.c", lineNumber=lineNumber(at)entry=611) at assert.c:69
#3 0x00000000006915db in parallel_vacuum_process_all_indexes (pvs=0x2e85f80, num_index_scans=<optimized out>, vacuum=<optimized out>) at vacuumparallel.c:611
#4 0x00000000005083e6 in heap_vacuum_rel (rel=<optimized out>, params=<optimized out>, bstrategy=<optimized out>) at vacuumlazy.c:2679
#5 0x000000000068e97a in table_relation_vacuum (bstrategy=<optimized out>, params=0x7fff46de9a80, rel=0x7f52c7bc2c10) at ../../../src/include/access/tableam.h:1680
#6 vacuum_rel (relid=52187497, relation=<optimized out>, params=0x7fff46de9a80) at vacuum.c:2092
#7 0x000000000068fe9e in vacuum (relations=0x2dbeee8, params=params(at)entry=0x7fff46de9a80, bstrategy=<optimized out>, bstrategy(at)entry=0x0, isTopLevel=isTopLevel(at)entry=true) at vacuum.c:475
#8 0x000000000069065e in ExecVacuum (pstate=pstate(at)entry=0x2dc38d0, vacstmt=vacstmt(at)entry=0x2d9f3a0, isTopLevel=isTopLevel(at)entry=true) at vacuum.c:275
#9 0x00000000008567b0 in standard_ProcessUtility (pstmt=pstmt(at)entry=0x2d9f7a0, queryString=queryString(at)entry=0x2d9e8a0 "VACUUM ANALYZE alarms", readOnlyTree=<optimized out>, context=context(at)entry=PROCESS_UTILITY_TOPLEVEL,
params=params(at)entry=0x0, queryEnv=queryEnv(at)entry=0x0, dest=dest(at)entry=0x2d9f890, qc=qc(at)entry=0x7fff46dea0c0) at utility.c:866
#10 0x00007f52b891c450 in pgss_ProcessUtility (pstmt=0x2d9f7a0, queryString=0x2d9e8a0 "VACUUM ANALYZE alarms", readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x2d9f890,
qc=0x7fff46dea0c0) at pg_stat_statements.c:1143
#11 0x000000000085490a in PortalRunUtility (portal=portal(at)entry=0x2e20fc0, pstmt=0x2d9f7a0, isTopLevel=isTopLevel(at)entry=true, setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=0x2d9f890, qc=0x7fff46dea0c0) at pquery.c:1158
#12 0x0000000000854a53 in PortalRunMulti (portal=portal(at)entry=0x2e20fc0, isTopLevel=isTopLevel(at)entry=true, setHoldSnapshot=setHoldSnapshot(at)entry=false, dest=dest(at)entry=0x2d9f890, altdest=altdest(at)entry=0x2d9f890,
qc=qc(at)entry=0x7fff46dea0c0) at pquery.c:1322
#13 0x0000000000855029 in PortalRun (portal=0x2e20fc0, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x2d9f890, altdest=0x2d9f890, qc=0x7fff46dea0c0) at pquery.c:791
#14 0x000000000085099b in exec_simple_query (query_string=0x2d9e8a0 "VACUUM ANALYZE alarms") at postgres.c:1250
#15 0x000000000085268a in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4581
#16 0x0000000000496a21 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4504
#17 BackendStartup (port=0x2dbe9c0) at postmaster.c:4232
#18 ServerLoop () at postmaster.c:1806
#19 0x00000000007b3980 in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x2d99280) at postmaster.c:1478
#20 0x00000000004986a6 in main (argc=3, argv=0x2d99280) at main.c:202

(gdb) p *pvs
$2 = {pcxt = 0x2e84490, indrels = 0x2e84220, nindexes = 8, shared = 0x2aaaaf142380, indstats = 0x2aaaaf1423c0, dead_items = 0x2aaaab142380, buffer_usage = 0x2aaaab142260, wal_usage = 0x2aaaab142220,
will_parallel_vacuum = 0x2e8f750, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0, nindexes_parallel_condcleanup = 5, bstrategy = 0x2dbed40, relnamespace = 0x0, relname = 0x0, indname = 0x0,
status = PARALLEL_INDVAC_STATUS_INITIAL}

(gdb) info locals
indstats = <optimized out>
i = <optimized out>
nworkers = 2

(gdb) p *pvs
$4 = {pcxt = 0x2e84490, indrels = 0x2e84220, nindexes = 8, shared = 0x2aaaaf142380, indstats = 0x2aaaaf1423c0, dead_items = 0x2aaaab142380, buffer_usage = 0x2aaaab142260, wal_usage = 0x2aaaab142220,
will_parallel_vacuum = 0x2e8f750, nindexes_parallel_bulkdel = 5, nindexes_parallel_cleanup = 0, nindexes_parallel_condcleanup = 5, bstrategy = 0x2dbed40, relnamespace = 0x0, relname = 0x0, indname = 0x0,
status = PARALLEL_INDVAC_STATUS_INITIAL}

I reproduced it like this:

pryzbyj=# VACUUM (PARALLEL 2,VERBOSE,INDEX_CLEANUP on) alarms; -- DISABLE_PAGE_SKIPPING true
INFO: vacuuming "pryzbyj.public.alarms"
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.

So I'll be back shortly with more...

--
Justin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2022-08-18 13:38:31 Re: RFC: Moving specific attributes from pg_attribute column into attoptions
Previous Message Amit Kapila 2022-08-18 11:44:59 Re: Perform streaming logical transactions by background workers and parallel apply