Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 048_vacuum_horizon_floor.pl hangs due to wakeup lost inside LockBufferForCleanup
Date: 2026-06-23 02:15:00
Message-ID: CABPTF7VJ3jg05n68hf0iYB=ULsvH559ZJGLVjdUowRVZzcnWqg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Andres,

On Tue, Jun 23, 2026 at 4:04 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2026-06-17 23:00:00 +0300, Alexander Lakhin wrote:
> > Hello hackers,
> > With this diagnostic addition:
> > --- a/src/backend/storage/buffer/bufmgr.c
> > +++ b/src/backend/storage/buffer/bufmgr.c
> > @@ -6743,9 +6743,12 @@ LockBufferForCleanup(Buffer buffer)
> > }
> > bufHdr->wait_backend_pgprocno = MyProcNumber;
> > PinCountWaitBuf = bufHdr;
> > - UnlockBufHdrExt(bufHdr, buf_state,
> > - BM_PIN_COUNT_WAITER, 0,
> > - 0);
> > +for (volatile int i = 0; i < 10000000; i++);
> > + buf_state = UnlockBufHdrExt(bufHdr, buf_state,
> > + BM_PIN_COUNT_WAITER, 0,
> > + 0);
> > + if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
> > + elog(LOG, "!!!LockBufferForCleanup| wakeup lost");
> > LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >
>
> I don't think seeing BUF_STATE_GET_REFCOUNT(buf_state) == 1 guarantees that a
> wakeup was lost in any sort of way. Nothing prevents another backend from
> acquiring a pin on the buffer while we are waiting for a pincount, so
> occasionally seeing BUF_STATE_GET_REFCOUNT() == 1 here is to be expected.

Yeah, the log is misleading as observing
BUF_STATE_GET_REFCOUNT(buf_state) == 1 could be temporary, not
necessarily leading to lost wake-up.

A better wording seems to be:

elog(LOG, "buffer refcount is 1 after publishing cleanup-lock waiter");

> You also can't expect repeatedly doing
> UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER) to really make sense, because
> BM_PIN_COUNT_WAITER only works if it was set *before* another backend releases
> its pin, because otherwise the other backend obviously won't notice that the
> flag was set if it only was set after the pg_atomic_fetch_sub_u64() in
> UnpinBufferNoOwner().

This also makes sense to me. However the reproducer seems not to
repeatedly do UnlockBufHdrExt(set_bits=BM_PIN_COUNT_WAITER). It uses
a for loop as a delay before setting the BM_PIN_COUNT_WAITER, which
seems ok to me. Am I missing sth here?

>
> I don't think the problem is that we are loosing a signal, the problem is that
> we need to recheck if BUF_STATE_GET_REFCOUNT(buf_state) == 1 after setting
> BM_PIN_COUNT_WAITER. I think this is a bug in (depending on your perspective)
> either
>
> commit 5310fac6e0f
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: 2025-11-06 16:43:16 -0500
>
> bufmgr: Use atomic sub for unpinning buffers
>
> or
>
> commit c75ebc657ffce8dab76471da31aafb79fbe3fda2
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Date: 2025-11-06 16:42:10 -0500
>
> bufmgr: Allow some buffer state modifications while holding header lock
>
>
> Because before those commits, the buffer could not be unpinned while we held
> the buffer header lock. With them, this race opens up.

I agree with this as the root cause of the issue. Thanks for your analysis!

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2026-06-23 02:18:11 Re: [PATCH] Add pg_get_table_ddl() to reconstruct CREATE TABLE statements
Previous Message Chao Li 2026-06-23 02:11:52 Fix variadic argument types for pg_get_xxx_ddl() functions