Assertion failure on PG15 with modified test_shm_mq test

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Andres Freund <andres(at)anarazel(dot)de>
Subject: Assertion failure on PG15 with modified test_shm_mq test
Date: 2022-08-17 05:45:28
Message-ID: CABOikdM+cR6fDd1XkGu0paKWaa1E_s4WRG3oETw8u_U5NK1oeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

I've a slightly modified version of test_shm_mq, that I changed to include
a shared fileset. The motivation to do that came because I hit an
assertion failure with PG15 while doing some development work on BDR and I
suspected it to be a PG15 bug.

The stack trace looks as below:

(lldb) bt
* thread #1
* frame #0: 0x00007ff8187b100e libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x00007ff8187e71ff libsystem_pthread.dylib`pthread_kill + 263
frame #2: 0x00007ff818732d24 libsystem_c.dylib`abort + 123
frame #3: 0x000000010fce1bab
postgres`ExceptionalCondition(conditionName="pgstat_is_initialized &&
!pgstat_is_shutdown", errorType="FailedAssertion", fileName="pgstat.c",
lineNumber=1227) at assert.c:69:2
frame #4: 0x000000010fb06412 postgres`pgstat_assert_is_up at
pgstat.c:1227:2
frame #5: 0x000000010fb0d2c7
postgres`pgstat_get_entry_ref(kind=PGSTAT_KIND_DATABASE, dboid=0, objoid=0,
create=true, created_entry=0x0000000000000000) at pgstat_shmem.c:406:2
frame #6: 0x000000010fb07579
postgres`pgstat_prep_pending_entry(kind=PGSTAT_KIND_DATABASE, dboid=0,
objoid=0, created_entry=0x0000000000000000) at pgstat.c:1068:14
frame #7: 0x000000010fb09cce
postgres`pgstat_prep_database_pending(dboid=0) at pgstat_database.c:327:14
frame #8: 0x000000010fb09dff
postgres`pgstat_report_tempfile(filesize=0) at pgstat_database.c:179:10
frame #9: 0x000000010fa8dbe9
postgres`ReportTemporaryFileUsage(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0",
size=0) at fd.c:1521:2
frame #10: 0x000000010fa8db3c
postgres`PathNameDeleteTemporaryFile(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0",
error_on_failure=false) at fd.c:1945:3
frame #11: 0x000000010fa8d3a8
postgres`unlink_if_exists_fname(fname="base/pgsql_tmp/pgsql_tmp17312.0.fileset/test_mq_sharefile.0",
isdir=false, elevel=15) at fd.c:3674:3
frame #12: 0x000000010fa8d270
postgres`walkdir(path="base/pgsql_tmp/pgsql_tmp17312.0.fileset",
action=(postgres`unlink_if_exists_fname at fd.c:3663),
process_symlinks=false, elevel=15) at fd.c:3573:5
frame #13: 0x000000010fa8d0e2
postgres`PathNameDeleteTemporaryDir(dirname="base/pgsql_tmp/pgsql_tmp17312.0.fileset")
at fd.c:1689:2
frame #14: 0x000000010fa91ac1
postgres`FileSetDeleteAll(fileset=0x0000000119240870) at fileset.c:165:3
frame #15: 0x000000010fa92b08
postgres`SharedFileSetOnDetach(segment=0x00007f93ff00a7c0,
datum=4716759152) at sharedfileset.c:119:3
frame #16: 0x000000010fa96b00
postgres`dsm_detach(seg=0x00007f93ff00a7c0) at dsm.c:801:3
frame #17: 0x000000010fa96f51 postgres`dsm_backend_shutdown at
dsm.c:738:3
frame #18: 0x000000010fa99402 postgres`shmem_exit(code=1) at ipc.c:259:2
frame #19: 0x000000010fa99227 postgres`proc_exit_prepare(code=1) at
ipc.c:194:2
frame #20: 0x000000010fa99133 postgres`proc_exit(code=1) at ipc.c:107:2
frame #21: 0x000000010fce318c postgres`errfinish(filename="postgres.c",
lineno=3204, funcname="ProcessInterrupts") at elog.c:661:3
frame #22: 0x000000010fad7c51 postgres`ProcessInterrupts at
postgres.c:3201:4
frame #23: 0x000000011924d85b
test_shm_mq.so`test_shm_mq_main(main_arg=1155036180) at worker.c:159:2
frame #24: 0x000000010f9da11e postgres`StartBackgroundWorker at
bgworker.c:858:2
frame #25: 0x000000010f9e80b4
postgres`do_start_bgworker(rw=0x00007f93ef904080) at postmaster.c:5823:4
frame #26: 0x000000010f9e2524 postgres`maybe_start_bgworkers at
postmaster.c:6047:9
frame #27: 0x000000010f9e0e63
postgres`sigusr1_handler(postgres_signal_arg=30) at postmaster.c:5204:3
frame #28: 0x00007ff8187fcdfd libsystem_platform.dylib`_sigtramp + 29
frame #29: 0x00007ff8187b2d5b libsystem_kernel.dylib`__select + 11
frame #30: 0x000000010f9e268d postgres`ServerLoop at
postmaster.c:1770:13
frame #31: 0x000000010f9e0157 postgres`PostmasterMain(argc=8,
argv=0x0000600002f30190) at postmaster.c:1478:11
frame #32: 0x000000010f8bc930 postgres`main(argc=8,
argv=0x0000600002f30190) at main.c:202:3
frame #33: 0x000000011f7d651e dyld`start + 462

I notice that pgstat_shutdown_hook() is registered as a before-shmem-exit
callback. The callback is responsible for detaching from the pgstat shared
memory segment. But looks like other parts of the system still expect it to
be available during later stages of proc exit.

It's not clear to me if pgstat shutdown should happen later or code that
gets executed later in the cycle should not try to use pgstat. It's also
entirely possible that my usage of SharedFileSet is completely wrong. If
that's the case, please let me know the mistake in the usage.

Patch modifying the test case is attached. In order to reproduce the
problem quickly, I added a CHECK_FOR_INTERRUPTS() in the test, but I don't
see why that would be a bad coding practice.

Thanks,
Pavan

Attachment Content-Type Size
crash_pgstat.patch application/octet-stream 5.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2022-08-17 05:47:24 Re: add checkpoint stats of snapshot and mapping files of pg_logical dir
Previous Message Peter Eisentraut 2022-08-17 05:26:16 Remove dummyret definition