From: | Rahila Syed <rahilasyed90(at)gmail(dot)com> |
---|---|
To: | tharakan(at)gmail(dot)com |
Cc: | pgsql-bugs(at)lists(dot)postgresql(dot)org, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe |
Date: | 2025-07-21 07:21:14 |
Message-ID: | CAH2L28uqTCVCCSzdCXtQhNDQxKNDfB=0Ue0Z0ZXqdrK5p23XtA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hi,
This appears to be a valid issue where the Autovacuum worker fails while
already holding an
LWLock on one of the pgStatLocal.shared_hash partitions. As a result, when
we attempt to
access this table again during proc_exit cleanup in dshash_find, the assert
is triggered. I haven’t
yet checked exactly where the lock is acquired within the Autovacuum
worker, but as Dilip mentioned,
reviewing where the error occurs in the Autovacuum worker would be helpful.
Thank you,
Rahila Syed
On Mon, Jul 21, 2025 at 11:22 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> On Sun, Jul 20, 2025 at 9:35 PM PG Bug reporting form
> <noreply(at)postgresql(dot)org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference: 18992
> > Logged by: Robins Tharakan
> > Email address: tharakan(at)gmail(dot)com
> > PostgreSQL version: 18beta2
> > Operating system: Ubuntu
> > Description:
> >
> > I couldn't repro the assert at will, but the test setup crashed
> > thrice since yesterday and thus this report. Although this was
> > on a recent version (1e9b5140c44), the test itself is running after
> > a few weeks, so the issue may not be recent.
> >
> >
> > Error Log
> > =========
> > TRAP: failed
> > Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> > DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c",
> Line:
> > 400, PID: 147794
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (ExceptionalCondition+0xbb)[0x5a609cb46036]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (dshash_find+0xab)[0x5a609c6a81f2]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (pgstat_drop_entry+0xc2)[0x5a609c968cb7]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x73c443)[0x5a609c95c443]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (shmem_exit+0xa6)[0x5a609c8ef83e]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x6cf6e2)[0x5a609c8ef6e2]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (proc_exit+0x74)[0x5a609c8ef626]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (postmaster_child_launch+0x174)[0x5a609c82ad34]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x612191)[0x5a609c832191]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x612328)[0x5a609c832328]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x611dd5)[0x5a609c831dd5]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x60ec84)[0x5a609c82ec84]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (PostmasterMain+0x1546)[0x5a609c82e5e4]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (main+0x38c)[0x5a609c6ca6f3]
> > /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (_start+0x25)[0x5a609c307fb5]
> >
> >
> > TRAP: failed
> > Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> > DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c",
> Line:
> > 400, PID: 147794
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (ExceptionalCondition+0xbb)[0x5a609cb46036]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (dshash_find+0xab)[0x5a609c6a81f2]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (pgstat_drop_entry+0xc2)[0x5a609c968cb7]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x73c443)[0x5a609c95c443]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (shmem_exit+0xa6)[0x5a609c8ef83e]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x6cf6e2)[0x5a609c8ef6e2]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (proc_exit+0x74)[0x5a609c8ef626]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (AutoVacWorkerMain+0x19c)[0x5a609c8231ad]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (postmaster_child_launch+0x174)[0x5a609c82ad34]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x612191)[0x5a609c832191]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x612328)[0x5a609c832328]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x611dd5)[0x5a609c831dd5]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x60ec84)[0x5a609c82ec84]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (PostmasterMain+0x1546)[0x5a609c82e5e4]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (main+0x38c)[0x5a609c6ca6f3]
> > /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x7f726022a1ca]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x7f726022a28b]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (_start+0x25)[0x5a609c307fb5]
> > 2025-07-19 20:07:07.398 ACST [55365] LOG: autovacuum worker (PID 147794)
> > was terminated by signal 6: Aborted
> >
> >
> >
> > 2025-07-20 06:17:50.376 ACST [1190828] FATAL: can't attach the same
> segment
> > more than once
> > TRAP: failed
> > Assert("!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> > DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))"), File: "dshash.c",
> Line:
> > 400, PID: 1190928
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (ExceptionalCondition+0xbb)[0x56d3cc97c036]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (dshash_find+0xab)[0x56d3cc4de1f2]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (pgstat_drop_entry+0xc2)[0x56d3cc79ecb7]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x73c443)[0x56d3cc792443]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (shmem_exit+0xa6)[0x56d3cc72583e]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x6cf6e2)[0x56d3cc7256e2]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (proc_exit+0x74)[0x56d3cc725626]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (AutoVacWorkerMain+0x19c)[0x56d3cc6591ad]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (postmaster_child_launch+0x174)[0x56d3cc660d34]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x612191)[0x56d3cc668191]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x612328)[0x56d3cc668328]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x611dd5)[0x56d3cc667dd5]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (+0x60ec84)[0x56d3cc664c84]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> > (PostmasterMain+0x1546)[0x56d3cc6645e4]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (main+0x38c)[0x56d3cc5006f3]
> > /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x758b0e42a1ca]
> > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x758b0e42a28b]
> > postgres: 1e9b5140c44(at)sqith: autovacuum worker
> (_start+0x25)[0x56d3cc13dfb5]
> > 2025-07-20 06:18:22.919 ACST [169020] LOG: autovacuum worker (PID
> 1190928)
> > was terminated by signal 6: Aborted
> >
> >
> >
> > Backtrace
> > =========
> > (gdb) bt
> > #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
> > out>) at ./nptl/pthread_kill.c:44
> > #1 __pthread_kill_internal (signo=6, threadid=<optimised out>) at
> > ./nptl/pthread_kill.c:78
> > #2 __GI___pthread_kill (threadid=<optimised out>, signo=signo(at)entry=6)
> at
> > ./nptl/pthread_kill.c:89
> > #3 0x00007f726024527e in __GI_raise (sig=sig(at)entry=6) at
> > ../sysdeps/posix/raise.c:26
> > #4 0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
> > #5 0x00005a609cb46069 in ExceptionalCondition
> (conditionName=0x5a609cd3af40
> > "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> > DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))",
> fileName=0x5a609cd3af03
> > "dshash.c", lineNumber=400) at assert.c:66
> > #6 0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
> > key=0x7ffe06d72500, exclusive=true) at dshash.c:400
> > #7 0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0,
> objid=5015) at
> > pgstat_shmem.c:988
> > #8 0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
> > pgstat.c:622
> > #9 0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
> > #10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
> > #11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
> > #12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
> > startup_data_len=0) at autovacuum.c:1456
> > #13 0x00005a609c82ad34 in postmaster_child_launch
> > (child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
> > startup_data_len=0, client_sock=0x0) at launch_backend.c:290
> > #14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
> > postmaster.c:3973
> > #15 0x00005a609c832328 in StartAutovacuumWorker () at postmaster.c:4037
> > #16 0x00005a609c831dd5 in process_pm_pmsignal () at postmaster.c:3794
> > #17 0x00005a609c82ec84 in ServerLoop () at postmaster.c:1695
> > #18 0x00005a609c82e5e4 in PostmasterMain (argc=3, argv=0x5a60ab733940) at
> > postmaster.c:1400
> > #19 0x00005a609c6ca6f3 in main (argc=3, argv=0x5a60ab733940) at
> main.c:231
> >
> >
> > Backtrace Full
> > ==============
> > #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimised
> > out>) at ./nptl/pthread_kill.c:44
> > tid = <optimised out>
> > ret = 0
> > pd = <optimised out>
> > old_mask = {__val = {18446744073709551568}}
> > ret = <optimised out>
> > pd = <optimised out>
> > old_mask = <optimised out>
> > ret = <optimised out>
> > tid = <optimised out>
> > ret = <optimised out>
> > resultvar = <optimised out>
> > resultvar = <optimised out>
> > __arg3 = <optimised out>
> > __arg2 = <optimised out>
> > __arg1 = <optimised out>
> > _a3 = <optimised out>
> > _a2 = <optimised out>
> > _a1 = <optimised out>
> > __futex = <optimised out>
> > resultvar = <optimised out>
> > __arg3 = <optimised out>
> > __arg2 = <optimised out>
> > __arg1 = <optimised out>
> > _a3 = <optimised out>
> > _a2 = <optimised out>
> > _a1 = <optimised out>
> > __futex = <optimised out>
> > __private = <optimised out>
> > __oldval = <optimised out>
> > #1 __pthread_kill_internal (signo=6, threadid=<optimised out>) at
> > ./nptl/pthread_kill.c:78
> > No locals.
> > #2 __GI___pthread_kill (threadid=<optimised out>, signo=signo(at)entry=6)
> at
> > ./nptl/pthread_kill.c:89
> > No locals.
> > #3 0x00007f726024527e in __GI_raise (sig=sig(at)entry=6) at
> > ../sysdeps/posix/raise.c:26
> > ret = <optimised out>
> > #4 0x00007f72602288ff in __GI_abort () at ./stdlib/abort.c:79
> > save_stage = 1
> > act = {__sigaction_handler = {sa_handler = 0x20, sa_sigaction =
> > 0x20}, sa_mask = {__val = {2, 140129217590784, 1, 140129217603397, 3,
> > 140729013182628, 12, 140129217603401, 2, 3474071045457511480,
> > 3846749418945733433, 140729013182720, 3833180526369726083,
> 140729013182784,
> > 16669054559647844352, 140729013186696}}, sa_flags = 114765976,
> sa_restorer =
> > 0x3}
> > #5 0x00005a609cb46069 in ExceptionalCondition
> (conditionName=0x5a609cd3af40
> > "!LWLockAnyHeldByMe(&(hash_table)->control->partitions[0].lock,
> > DSHASH_NUM_PARTITIONS, sizeof(dshash_partition))",
> fileName=0x5a609cd3af03
> > "dshash.c", lineNumber=400) at assert.c:66
> > No locals.
> > #6 0x00005a609c6a81f2 in dshash_find (hash_table=0x5a60ab734b10,
> > key=0x7ffe06d72500, exclusive=true) at dshash.c:400
> > hash = 147533127
> > partition = 4
> > item = 0x8cb2d4700000001
> > #7 0x00005a609c968cb7 in pgstat_drop_entry (kind=6, dboid=0,
> objid=5015) at
> > pgstat_shmem.c:988
> > key = {kind = 6, dboid = 0, objid = 5015}
> > shent = 0x5a609c8c83ec <pgaio_shutdown+406>
> > freed = true
> > #8 0x00005a609c95c443 in pgstat_shutdown_hook (code=0, arg=0) at
> > pgstat.c:622
> > No locals.
> > #9 0x00005a609c8ef83e in shmem_exit (code=0) at ipc.c:243
> > __func__ = "shmem_exit"
> > #10 0x00005a609c8ef6e2 in proc_exit_prepare (code=0) at ipc.c:198
> > __func__ = "proc_exit_prepare"
> > #11 0x00005a609c8ef626 in proc_exit (code=0) at ipc.c:111
> > __func__ = "proc_exit"
> > #12 0x00005a609c8231ad in AutoVacWorkerMain (startup_data=0x0,
> > startup_data_len=0) at autovacuum.c:1456
> > local_sigjmp_buf = {{__jmpbuf = {140729013186712,
> > -5847007501798591079, 3, 0, 99370996794424, 140129240260608,
> > -5847007501689539175, -1880687092341131879}, __mask_was_saved = 1,
> > __saved_mask = {__val = {18446744066192964099, 99370993001878, 0,
> > 99370993001878, 15511593002, 99370993001878, 0, 99371239819984,
> > 99371239819904, 140729013184112, 16669054559647844352,
> > 140729013184224, 140129216851989, 140729013184144,
> > 16669054559647844352, 16}}}}
> > dbid = 4
> > __func__ = "AutoVacWorkerMain"
> > #13 0x00005a609c82ad34 in postmaster_child_launch
> > (child_type=B_AUTOVAC_WORKER, child_slot=10002, startup_data=0x0,
> > startup_data_len=0, client_sock=0x0) at launch_backend.c:290
> > pid = 0
> > #14 0x00005a609c832191 in StartChildProcess (type=B_AUTOVAC_WORKER) at
> > postmaster.c:3973
> > pmchild = 0x7f7260f16378
> > pid = 32766
> > __func__ = "StartChildProcess"
> >
> >
> > Found using SQLSmith.
>
> So the call stack shows that it hit the exception while cleaning up
> stats during proc_exit, due to an error, I am not sure how easy to
> locate this, but this is an error path, so can you help with the error
> reported in the log, before hitting the exception?
>
> --
> Regards,
> Dilip Kumar
> Google
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2025-07-21 12:29:16 | BUG #18993: [BUG] Unreachable code in pg_next_dst_boundary() |
Previous Message | Dilip Kumar | 2025-07-21 05:52:18 | Re: BUG #18992: Autovacuum triggering assert - LWLockAnyHeldByMe |