Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup

From: Andres Freund <andres(at)anarazel(dot)de>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup
Date: 2026-06-22 20:11:23
Message-ID: uos73dsx5m5pyf73ixkr6khx5gn2m54dhwaa7lg2ezrvitedlj@o7ciyk5px3u2
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-06-22 16:14:08 +0800, Xuneng Zhou wrote:
> This looks like a real lost-wakeup race in LockBufferForCleanup().

I don't think any wakeup is lost. The problem is that we're waiting for a
wakeup when it is not guaranteed that one will arrive.

> The relevant sequence in failed test 048 is:
>
> - Session B opens a cursor and fetches one heap tuple, leaving a heap
> buffer pinned.
> - Session A starts VACUUM (FREEZE).
> - VACUUM reaches that page and waits in LockBufferForCleanup().
> - Session B later advances/closes the cursor, releasing the pin.
> - VACUUM is expected to wake up and finish.
>
> In the failure, the tap test had already passed its three sql
> assertions, but then it timed out waiting for VACUUM completion:
>
> ok 3 - Cursor query returned 7 from second fetch
> poll_query_until timed out:
> SELECT vacuum_count > 0 ...
> last actual query output: f
> Looks like your test exited with 29 just after 3.
>
> The diagnostic log shows the actual race:
>
> LOG: !!!LockBufferForCleanup| wakeup lost
> CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"

> LOG: !!!LockBufferForCleanup]| before ProcWaitForSignal
> CONTEXT: while scanning block 90 of relation "public.vac_horizon_floor_table"

I don't think this is meaningful output, as written upthread.

> There was no later "after ProcWaitForSignal" before the shutdown,
> which implies that VACUUM published itself as a waiter, entered
> ProcWaitForSignal(), and not been woken up later.

> The direct regression appears to be 5310fac6e0f. It allows this interleaving:
>
> W: LockBufferForCleanup() holds buffer header lock
> W: observes refcount > 1
> P: releases the last competing pin with atomic fetch_sub
> P: old state does not contain BM_PIN_COUNT_WAITER, so no wakeup
> W: publishes BM_PIN_COUNT_WAITER
> W: sleeps in ProcWaitForSignal()
>
> At this point the condition W wanted is already true: refcount is 1,
> meaning only W's own pin remains. So W could sleep indefinitely as no
> future unpin to wake it.

However, this indeed seems to be the problem.

> We can fix this with the state returned by UnlockBufHdrExt() when
> publishing BM_PIN_COUNT_WAITER. If the wait refcount is 1, do not
> enter the wait path. Instead, fall through to the existing waiter-bit
> cleanup and retry the loop to acquire the cleanup lock normally. The
> reproducer test passed after applying the patch.

Why are we retrying, when we already achieved everything we wanted? Seems we
should just unset BM_PIN_COUNT_WAITER / wait_backend_pgprocno. I think the
easiest way to do this would be to not unlock the buffer header lock, but just
atomically-or-in BM_PIN_COUNT_WAITER and then recheck if we already are done,
while still holding the header lock.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-06-22 20:15:57 Re: The PostgreSQL C Dialect
Previous Message Andres Freund 2026-06-22 20:03:56 Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup