Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup
Date: 2026-06-22 08:14:08
Message-ID: CABPTF7W6PsSCOby4JDo9UZunsbHu91FHB0u1K0WUvzWjQ1WRJg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Alexander,

On Thu, Jun 18, 2026 at 4:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
>
> Hello hackers,
>
> A recent failure produced by skink [1] apparently materializes an
> interesting race condition:
> 398/398 recovery - postgresql:recovery/048_vacuum_horizon_floor TIMEOUT 15000.51s killed by signal 15 SIGTERM
>
> regress_log_048_vacuum_horizon_floor contains:
> [05:01:16.990](0.000s) ok 3 - Cursor query returned 7 from second fetch. Expected value 7.
> [09:08:43.280](14846.290s) # die: death by signal at
> /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 181.
> [09:08:43.281](0.001s) 1..3
> death by signal at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/Cluster.pm line 181.
>
> 048_vacuum_horizon_floor_primary.log contains:
> 2026-06-14 05:00:51.700 CEST [655012][client backend][1/9:0] LOG: statement: VACUUM (VERBOSE, FREEZE, PARALLEL 0)
> vac_horizon_floor_table;
> 2026-06-14 05:00:51.887 CEST [655012][client backend][1/10:0] INFO: aggressively vacuuming
> "test_db.public.vac_horizon_floor_table"
> 2026-06-14 05:00:53.494 CEST [669376][client backend][5/2:0] LOG: statement: SELECT count(*) >= 1 FROM pg_stat_activity
> WHERE pid = 655012
> AND wait_event = 'BufferCleanup';
> 2026-06-14 05:01:07.789 CEST [676703][walsender][26/0:0] LOG: received replication command: IDENTIFY_SYSTEM
> 2026-06-14 05:01:07.789 CEST [676703][walsender][26/0:0] STATEMENT: IDENTIFY_SYSTEM
> 2026-06-14 05:01:07.992 CEST [676703][walsender][26/0:0] LOG: received replication command: START_REPLICATION 0/03000000
> TIMELINE 1
> 2026-06-14 05:01:07.992 CEST [676703][walsender][26/0:0] STATEMENT: START_REPLICATION 0/03000000 TIMELINE 1
> 2026-06-14 05:01:11.486 CEST [678834][client backend][6/2:0] LOG: statement: SELECT EXISTS (SELECT * FROM
> pg_stat_replication);
> 2026-06-14 05:01:15.472 CEST [656147][client backend][2/2:0] LOG: statement: FETCH vac_horizon_floor_cursor1
> ;
> 2026-06-14 05:01:19.594 CEST [681642][client backend][7/2:0] LOG: statement: SELECT index_vacuum_count > 0
> FROM pg_stat_progress_vacuum
> WHERE datname='test_db' AND relid::regclass = 'vac_horizon_floor_table'::regclass;
> 2026-06-14 05:01:24.183 CEST [656147][client backend][2/2:0] LOG: statement: COMMIT;
> 2026-06-14 05:01:26.473 CEST [684424][client backend][8/2:0] LOG: statement: SELECT vacuum_count > 0
> FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table';
> ...
> 2026-06-14 09:08:40.354 CEST [3155195][client backend][7/2090:0] LOG: statement: SELECT vacuum_count > 0
> FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table';
> 2026-06-14 09:08:42.036 CEST [3156594][client backend][8/2090:0] LOG: statement: SELECT vacuum_count > 0
> FROM pg_stat_all_tables WHERE relname = 'vac_horizon_floor_table';
> 2026-06-14 09:08:43.285 CEST [636342][postmaster][:0] LOG: received immediate shutdown request
>
>
> The next run of the test [2] took about 2 minutes:
> 53/398 recovery - postgresql:recovery/048_vacuum_horizon_floor OK 120.90s 3 subtests passed
>
> There was a similar failure in the past year [3]:
> 38/267 postgresql:recovery / recovery/043_vacuum_horizon_floor ERROR 335.83s exit status 29
>
> regress_log_043_vacuum_horizon_floor
> [16:59:38.183](0.003s) ok 3 - Cursor query returned 1 from second fetch. Expected value 1.
> IPC::Run: timeout on timer #2 at /usr/share/perl5/IPC/Run.pm line 3025.
> # Postmaster PID for node "primary" is 2872485
>
> But that time it looked more like an ordinary timeout.
>
> I've managed to reproduce such a failure locally, on a slowed down VM,
> under Valgrind:
> ITERATION 23
> Sun Jun 14 01:02:30 PM UTC 2026
> # +++ tap check in src/test/recovery +++
> t/048_vacuum_horizon_floor.pl .. ok
> All tests successful.
> Files=1, Tests=3, 131 wallclock secs ( 0.01 usr 0.00 sys + 46.21 cusr 3.12 csys = 49.34 CPU)
> Result: PASS
> ITERATION 24
> Sun Jun 14 01:04:41 PM UTC 2026
> # +++ tap check in src/test/recovery +++
> t/048_vacuum_horizon_floor.pl .. 3/? # die: IPC::Run: timeout on timer #2 at /usr/share/perl5/IPC/Run.pm line 3361.
> # Looks like your test exited with 29 just after 3.
> t/048_vacuum_horizon_floor.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
>
> With this diagnostic addition:
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer)
> }
> bufHdr->wait_backend_pgprocno = MyProcNumber;
> PinCountWaitBuf = bufHdr;
> - UnlockBufHdrExt(bufHdr, buf_state,
> - BM_PIN_COUNT_WAITER, 0,
> - 0);
> +for (volatile int i = 0; i < 10000000; i++);
> + buf_state = UnlockBufHdrExt(bufHdr, buf_state,
> + BM_PIN_COUNT_WAITER, 0,
> + 0);
> + if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
> + elog(LOG, "!!!LockBufferForCleanup| wakeup lost");
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>
> /* Wait to be signaled by UnpinBuffer() */
> @@ -6794,7 +6797,11 @@ LockBufferForCleanup(Buffer buffer)
> SetStartupBufferPinWaitBufId(-1);
> }
> else
> +{
> +elog(LOG, "!!!LockBufferForCleanup]| before ProcWaitForSignal");
> ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
> +elog(LOG, "!!!LockBufferForCleanup]| after ProcWaitForSignal");
> +}
>
> /*
> * Remove flag marking us as waiter. Normally this will not be set
>
> (The delay before UnlockBufHdrExt() helps reproducing for me --- the test
> fails within 10 iterations.)
>
> I could see:
> # +++ tap check in src/test/recovery +++
> t/048_vacuum_horizon_floor.pl .. 3/? # die: IPC::Run: timeout on timer #2 at /usr/share/perl5/IPC/Run.pm line 3156.
> # Looks like your test exited with 29 just after 3.
> t/048_vacuum_horizon_floor.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00)
> All 3 subtests passed
>
> Test Summary Report
> -------------------
> t/048_vacuum_horizon_floor.pl (Wstat: 7424 (exited 29) Tests: 3 Failed: 0)
> Non-zero exit status: 29
> Files=1, Tests=3, 318 wallclock secs ( 0.01 usr 0.00 sys + 93.12 cusr 4.73 csys = 97.86 CPU)
>
> 048_vacuum_horizon_floor_primary.log contains:
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] LOG: !!!LockBufferForCleanup| wakeup lost
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] CONTEXT: while scanning block 26 of relation
> "public.vac_horizon_floor_table"
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] STATEMENT: VACUUM (VERBOSE, FREEZE, PARALLEL
> 0) vac_horizon_floor_table;
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] LOG: !!!LockBufferForCleanup]| before
> ProcWaitForSignal
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] CONTEXT: while scanning block 26 of relation
> "public.vac_horizon_floor_table"
> 2026-06-17 19:00:31.646 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] STATEMENT: VACUUM (VERBOSE, FREEZE, PARALLEL
> 0) vac_horizon_floor_table;
> ...
> 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] LOG: !!!LockBufferForCleanup]| after
> ProcWaitForSignal
> 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] CONTEXT: while scanning block 26 of relation
> "public.vac_horizon_floor_table"
> 2026-06-17 19:04:36.835 UTC [178421] [048_vacuum_horizon_floor.pl:2/10/0] STATEMENT: VACUUM (VERBOSE, FREEZE, PARALLEL
> 0) vac_horizon_floor_table;
>
>
> [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-06-14%2001%3A21%3A38
> [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2026-06-14%2007%3A22%3A09
> [3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-05%2015%3A43%3A53

Thanks for reporting this issue. I can reproduce it with the delay.
This looks like a real lost-wakeup race in LockBufferForCleanup().

The relevant sequence in failed test 048 is:

- Session B opens a cursor and fetches one heap tuple, leaving a heap
buffer pinned.
- Session A starts VACUUM (FREEZE).
- VACUUM reaches that page and waits in LockBufferForCleanup().
- Session B later advances/closes the cursor, releasing the pin.
- VACUUM is expected to wake up and finish.

In the failure, the tap test had already passed its three sql
assertions, but then it timed out waiting for VACUUM completion:

ok 3 - Cursor query returned 7 from second fetch
poll_query_until timed out:
SELECT vacuum_count > 0 ...
last actual query output: f
Looks like your test exited with 29 just after 3.

The diagnostic log shows the actual race:

LOG: !!!LockBufferForCleanup| wakeup lost
CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"

LOG: !!!LockBufferForCleanup]| before ProcWaitForSignal
CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"

There was no later "after ProcWaitForSignal" before the shutdown,
which implies that VACUUM published itself as a waiter, entered
ProcWaitForSignal(), and not been woken up later.

The direct regression appears to be 5310fac6e0f. It allows this interleaving:

W: LockBufferForCleanup() holds buffer header lock
W: observes refcount > 1
P: releases the last competing pin with atomic fetch_sub
P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup
W: publishes BM_PIN_COUNT_WAITER
W: sleeps in ProcWaitForSignal()

At this point the condition W wanted is already true: refcount is 1,
meaning only W's own pin remains. So W could sleep indefinitely as no
future unpin to wake it.

We can fix this with the state returned by UnlockBufHdrExt() when
publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not
enter the wait path. Instead, fall through to the existing waiter-bit
cleanup and retry the loop to acquire the cleanup lock normally. The
reproducer test passed after applying the patch.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachment Content-Type Size
v1-0001-Fix-lost-wakeup-in-LockBufferForCleanup.patch application/octet-stream 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anton Voloshin 2026-06-22 08:15:50 Re: Reorganize GUC structs
Previous Message Chao Li 2026-06-22 08:10:22 Re: pgbench --continue-on-error: clarify TPS and failure reporting