| From: | Andres Freund <andres(at)anarazel(dot)de> |
|---|---|
| To: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
| Cc: | Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup |
| Date: | 2026-06-22 20:11:23 |
| Message-ID: | uos73dsx5m5pyf73ixkr6khx5gn2m54dhwaa7lg2ezrvitedlj@o7ciyk5px3u2 |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
On 2026-06-22 16:14:08 +0800, Xuneng Zhou wrote:
> This looks like a real lost-wakeup race in LockBufferForCleanup().
I don't think any wakeup is lost. The problem is that we're waiting for a
wakeup when it is not guaranteed that one will arrive.
> The relevant sequence in failed test 048 is:
>
> - Session B opens a cursor and fetches one heap tuple, leaving a heap
> buffer pinned.
> - Session A starts VACUUM (FREEZE).
> - VACUUM reaches that page and waits in LockBufferForCleanup().
> - Session B later advances/closes the cursor, releasing the pin.
> - VACUUM is expected to wake up and finish.
>
> In the failure, the tap test had already passed its three sql
> assertions, but then it timed out waiting for VACUUM completion:
>
> ok 3 - Cursor query returned 7 from second fetch
> poll_query_until timed out:
> SELECT vacuum_count > 0 ...
> last actual query output: f
> Looks like your test exited with 29 just after 3.
>
> The diagnostic log shows the actual race:
>
> LOG: !!!LockBufferForCleanup| wakeup lost
> CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"
> LOG: !!!LockBufferForCleanup]| before ProcWaitForSignal
> CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"
I don't think this is meaningful output, as written upthread.
> There was no later "after ProcWaitForSignal" before the shutdown,
> which implies that VACUUM published itself as a waiter, entered
> ProcWaitForSignal(), and not been woken up later.
> The direct regression appears to be 5310fac6e0f. It allows this interleaving:
>
> W: LockBufferForCleanup() holds buffer header lock
> W: observes refcount > 1
> P: releases the last competing pin with atomic fetch_sub
> P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup
> W: publishes BM_PIN_COUNT_WAITER
> W: sleeps in ProcWaitForSignal()
>
> At this point the condition W wanted is already true: refcount is 1,
> meaning only W's own pin remains. So W could sleep indefinitely as no
> future unpin to wake it.
However, this indeed seems to be the problem.
> We can fix this with the state returned by UnlockBufHdrExt() when
> publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not
> enter the wait path. Instead, fall through to the existing waiter-bit
> cleanup and retry the loop to acquire the cleanup lock normally. The
> reproducer test passed after applying the patch.
Why are we retrying, when we already achieved everything we wanted? Seems we
should just unset BM_PIN_COUNT_WAITER / wait_backend_pgprocno. I think the
easiest way to do this would be to not unlock the buffer header lock, but just
atomically-or-in BM_PIN_COUNT_WAITER and then recheck if we already are done,
while still holding the header lock.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-06-22 20:15:57 | Re: The PostgreSQL C Dialect |
| Previous Message | Andres Freund | 2026-06-22 20:03:56 | Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup |