v13: CLUSTER segv with wal_level=minimal and parallel index creation

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Subject: v13: CLUSTER segv with wal_level=minimal and parallel index creation
Date: 2020-09-07 02:37:37
Message-ID: 20200907023737.GA7158@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Following a bulk load, a CLUSTER command run by a maintenance script crashed.
This is currently reproducible on that instance, so please suggest if I can
provide more info.

< 2020-09-06 15:44:16.369 MDT >LOG: background worker "parallel worker" (PID 2576) was terminated by signal 6: Aborted
< 2020-09-06 15:44:16.369 MDT >DETAIL: Failed process was running: CLUSTER pg_attribute USING pg_attribute_relid_attnam_index

The crash happens during:
ts=# REINDEX INDEX pg_attribute_relid_attnum_index;
..but not:
ts=# REINDEX INDEX pg_attribute_relid_attnam_index ;

pg_catalog | pg_attribute_relid_attnam_index | index | postgres | pg_attribute | permanent | 31 MB |
pg_catalog | pg_attribute_relid_attnum_index | index | postgres | pg_attribute | permanent | 35 MB |

I suspect
|commit c6b92041d Skip WAL for new relfilenodes, under wal_level=minimal.

In fact, I set wal_level=minimal for the bulk load. Note also:
override | data_checksums | on
configuration file | checkpoint_timeout | 60
configuration file | maintenance_work_mem | 1048576
configuration file | max_wal_senders | 0
configuration file | wal_compression | on
configuration file | wal_level | minimal
configuration file | fsync | off
configuration file | full_page_writes | off
default | server_version | 13beta3

(gdb) bt
#0 0x00007ff9999ad387 in raise () from /lib64/libc.so.6
#1 0x00007ff9999aea78 in abort () from /lib64/libc.so.6
#2 0x0000000000921da5 in ExceptionalCondition (conditionName=conditionName(at)entry=0xad4078 "relcache_verdict == RelFileNodeSkippingWAL(relation->rd_node)", errorType=errorType(at)entry=0x977f49 "FailedAssertion",
fileName=fileName(at)entry=0xad3068 "relcache.c", lineNumber=lineNumber(at)entry=2976) at assert.c:67
#3 0x000000000091a08b in AssertPendingSyncConsistency (relation=0x7ff99c2a70b8) at relcache.c:2976
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit(at)entry=true, isParallelWorker=isParallelWorker(at)entry=true) at storage.c:685
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
#7 0x000000000053b826 in EndParallelWorkerTransaction () at xact.c:5300
#8 0x000000000052fcf7 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1479
#9 0x000000000076047a in StartBackgroundWorker () at bgworker.c:813
#10 0x000000000076d88d in do_start_bgworker (rw=0x23ac110) at postmaster.c:5865
#11 maybe_start_bgworkers () at postmaster.c:6091
#12 0x000000000076e43e in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5260
#13 <signal handler called>
#14 0x00007ff999a6c983 in __select_nocancel () from /lib64/libc.so.6
#15 0x00000000004887bc in ServerLoop () at postmaster.c:1691
#16 0x000000000076fb45 in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x237d280) at postmaster.c:1400
#17 0x000000000048a83d in main (argc=3, argv=0x237d280) at main.c:210

(gdb) bt f
...
#4 AssertPendingSyncs_RelationCache () at relcache.c:3036
status = {hashp = 0x23cba50, curBucket = 449, curEntry = 0x0}
locallock = <optimized out>
rels = 0x23ff018
maxrels = <optimized out>
nrels = 0
idhentry = <optimized out>
i = <optimized out>
#5 0x000000000058e591 in smgrDoPendingSyncs (isCommit=isCommit(at)entry=true, isParallelWorker=isParallelWorker(at)entry=true) at storage.c:685
pending = <optimized out>
nrels = 0
maxrels = 0
srels = 0x0
scan = {hashp = 0x23edf60, curBucket = 9633000, curEntry = 0xe01600 <TopTransactionStateData>}
pendingsync = <optimized out>
#6 0x000000000053b1a4 in CommitTransaction () at xact.c:2118
s = 0xe01600 <TopTransactionStateData>
latestXid = <optimized out>
is_parallel_worker = true
__func__ = "CommitTransaction"

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-09-07 02:51:47 Re: Remove page-read callback from XLogReaderState.
Previous Message Michael Paquier 2020-09-07 02:14:09 Re: Transactions involving multiple postgres foreign servers, take 2