Re: Buffer locking is special (hints, checksums, AIO writes)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Date: 2026-02-09 01:52:41
Message-ID: u644ma4erj75z46wckuq3szrlnci3wzlevq7brauk2p3v6h2l7@jkd7siijr7hx
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2026-02-07 14:59:34 +0200, Heikki Linnakangas wrote:
> A few minor nitpicks on v12 below. Other than these and the comments I wrote
> in separate emails, looks good to me.
>
> > @@ -371,8 +382,6 @@ _bt_killitems(IndexScanDesc scan)
> > }
> > /*
> > - * Since this can be redone later if needed, mark as dirty hint.
> > - *
> > * Whenever we mark anything LP_DEAD, we also set the page's
> > * BTP_HAS_GARBAGE flag, which is likewise just a hint. (Note that we
> > * only rely on the page-level flag in !heapkeyspace indexes.)
>
> Seems a bit random to remove that.

Fair.

> > +/*
> > + * Try to set a single hint bit in a buffer.
> > + *
> > + * This is a bit faster than BufferBeginSetHintBits() /
> > + * BufferFinishSetHintBits() when setting a single hint bit, but slower than
> > + * the former when setting several hint bits.
> > + */
> > +bool
> > +BufferSetHintBits16(uint16 *ptr, uint16 val, Buffer buffer)
>
> This could use some more explanation. The point is that this does "*ptr =
> val", if it's allowed to set hint bits. That's not obvious. And "single hint
> bit" isn't really accurate, as you could update multiple bits in *ptr with
> one call.

Agreed. I updated it to

* Try to set hint bits on a single 16bit value in a buffer.
*
* If hint bits are allowed to be set, set *ptr = val, try mark the buffer
* dirty and return true. Otherwise false is returned.
*
* *ptr needs to be a pointer to memory within the buffer.
*
* This is a bit faster than BufferBeginSetHintBits() /
* BufferFinishSetHintBits() when setting hints once in a buffer, but slower
* than the former when setting hint bits multiple times in the same buffer.

> > /*
> > * If the buffer was dirty, try to write it out. There is a race
> > * condition here, in that someone might dirty it after we released the
> > * buffer header lock above. We will recheck the dirty bit after
> > * re-locking the buffer header.
> > */
>
> It's not clear what "above" means in that paragraph. Where do we release the
> buffer header lock? In StrategyGetBuffer?
>
> (This is not actually new with this patch; it goes back to commit
> 5e89985928. Before that, there was a call to PinBuffer_Locked() which
> released the spinlock.)

Yea, looks like I should have edited the comment in that commit. Updated to:

* If the buffer was dirty, try to write it out. There is a race
* condition here, another backend could dirty the buffer between
* StrategyGetBuffer() checking that it is not in use and invalidating the
* buffer below. That's addressed by InvalidateVictimBuffer() verifying
* that the buffer is not dirty.

> > @@ -2516,18 +2515,21 @@ again:
> > /*
> > * If using a nondefault strategy, and writing the buffer would
> > * require a WAL flush, let the strategy decide whether to go ahead
> > - * and write/reuse the buffer or to choose another victim. We need a
> > - * lock to inspect the page LSN, so this can't be done inside
> > + * and write/reuse the buffer or to choose another victim. We need to
> > + * hold the content lock in at least share-exclusive mode to safely
> > + * inspect the page LSN, so this couldn't have been done inside
> > * StrategyGetBuffer.
> > */
> > if (strategy != NULL)
> > {
> > XLogRecPtr lsn;
> > - /* Read the LSN while holding buffer header lock */
> > - buf_state = LockBufHdr(buf_hdr);
> > + /*
> > + * As we now hold at least a share-exclusive lock on the buffer,
> > + * the LSN cannot change during the flush (and thus can't be
> > + * torn).
> > + */
> > lsn = BufferGetLSN(buf_hdr);
> > - UnlockBufHdr(buf_hdr);
> > if (XLogNeedsFlush(lsn)
> > && StrategyRejectBuffer(strategy, buf_hdr, from_ring))
>
> I think the second comment is redundant with the first one. Let's just
> remove it.

Agreed.

> > +/*
> > + * Helper for BufferBeginSetHintBits() and BufferSetHintBits16().
> > + *
> > + * This checks if the current lock mode already suffices to allow hint bits
> > + * being set and, if not, whether the current lock can be upgraded.
> > + *
> > + * Updates *lockstate when returning true.
> > + */
> > +static inline bool
> > +SharedBufferBeginSetHintBits(Buffer buffer, BufferDesc *buf_hdr, uint64 *lockstate)
>
> Would be good to be more explicit what returning true/false here means.

Hm, ISTM that'd just end up restating the comments for
BufferBeginSetHintBits(). Given SharedBufferBeginSetHintBits() is just an
implementation detail for BufferBeginSetHintBits/BufferSetHintBits16, I don't
think it's worth restating the details here.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gyan Sreejith 2026-02-09 02:08:15 Re: [Proposal] Adding Log File Capability to pg_createsubscriber
Previous Message Peter Smith 2026-02-09 01:10:52 Re: Skipping schema changes in publication