Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup

From: Andres Freund <andres(at)anarazel(dot)de>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup
Date: 2026-06-22 20:03:56
Message-ID: dxqtyud5euy6migdaafvpudsbn7bzcr7vjcc4vpfh2ffe3sk5a@zszfoxuiq7pl
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-06-17 23:00:00 +0300, Alexander Lakhin wrote:
> Hello hackers,
> With this diagnostic addition:
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer)
>                 }
>                 bufHdr->wait_backend_pgprocno = MyProcNumber;
>                 PinCountWaitBuf = bufHdr;
> -               UnlockBufHdrExt(bufHdr, buf_state,
> -                                               BM_PIN_COUNT_WAITER, 0,
> -                                               0);
> +for (volatile int i = 0; i < 10000000; i++);
> +               buf_state = UnlockBufHdrExt(bufHdr, buf_state,
> + BM_PIN_COUNT_WAITER, 0,
> +                                                               0);
> +               if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
> +                       elog(LOG, "!!!LockBufferForCleanup| wakeup lost");
>                 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
>

I don't think seeing BUF_STATE_GET_REFCOUNT(buf_state) == 1 guarantees that a
wakeup was lost in any sort of way. Nothing prevents another backend from
acquiring a pin on the buffer while we are waiting for a pincount, so
occasionally seeing BUF_STATE_GET_REFCOUNT() == 1 here is to be expected.

You also can't expect repeatedly doing
UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER) to really make sense, because
BM_PIN_COUNT_WAITER only works if it was set *before* another backend releases
its pin, because otherwise the other backend obviously won't notice that the
flag was set if it only was set after the pg_atomic_fetch_sub_u64() in
UnpinBufferNoOwner().

I don't think the problem is that we are loosing a signal, the problem is that
we need to recheck if BUF_STATE_GET_REFCOUNT(buf_state) == 1 after setting
BM_PIN_COUNT_WAITER. I think this is a bug in (depending on your perspective)
either

commit 5310fac6e0f
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: 2025-11-06 16:43:16 -0500

bufmgr: Use atomic sub for unpinning buffers

or

commit c75ebc657ffce8dab76471da31aafb79fbe3fda2
Author: Andres Freund <andres(at)anarazel(dot)de>
Date: 2025-11-06 16:42:10 -0500

bufmgr: Allow some buffer state modifications while holding header lock

Because before those commits, the buffer could not be unpinned while we held
the buffer header lock. With them, this race opens up.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-06-22 20:11:23 Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup
Previous Message Tom Lane 2026-06-22 19:45:15 Fix unsafe coding in ResourceOwnerReleaseAll()