Re: Buffer locking is special (hints, checksums, AIO writes)

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Date: 2025-09-22 22:14:12
Message-ID: 6kmid26do57ykqfpvq6iieniy4djsymhrypkjccazq5g4bbe6a@2y6owwv7qpex
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-09-15 19:05:37 -0400, Andres Freund wrote:
> On 2025-08-22 15:44:48 -0400, Andres Freund wrote:
> > The hardest part about this change is that everything kind of depends on each
> > other. The changes are large enough that they clearly can't just be committed
> > at once, but doing them over time risks [temporary] performance regressions.
> >
> >
> >
> >
> > The order of changes I think makes the most sense is the following:
> >
> > 1) Allow some modifications while holding the buffer header spinlock
> >
> > 2) Reduce buffer pin with just an atomic-sub
> >
> > This needs to happen first, otherwise there are performance regressions
> > during the later steps.
>
> Here are the first few cleaned up patches implementing the above steps, as
> well as some cleanups. I included a commit from another thread, as it
> conflicts with these changes, and we really should apply it - and it's
> arguably required to make the changes viable, as it removes one more use of
> PinBuffer_Locked().
>
> Another change included is to not return the buffer with the spinlock held
> from StrategyGetBuffer(), and instead pin the buffer in freelist.c. The reason
> for that is to reduce the most common PinBuffer_locked() call. By definition
> PinBuffer_locked() will become a bit slower due to 0003. But even without 0003
> it 0002 is faster than master. And the previous approach also just seems
> pretty unclean. I don't love that it requires the new TrackNewBufferPin(),
> but I don't really have a better idea.
>
> I invite particular attention to the commit message for 0003 as well as the
> comment changes in buf_internals.h within.

Robert looked at the patches while we were chatting, and I addressed his
feedback in this new version.

Changes:

- Updated patch description for 0002, giving a lot more background

- Improved BufferDesc comments a fair bit more in 0003

- Reduced the size of 0003 a bit, by using UnlockBufHdrExt() instead of a CAS
loop in buffer_stage_common() and reordering some things in
TerminateBufferIO()

- Made 0004 only use the non-looping atomic op in UnpinBufferNoOwner(), not
MarkBufferDirty(). I realized the latter would take additional complexity to
make safe (a CAS loop in TerminateBufferIO()). I am somewhat doubtful that
there are workloads where it matters...

Greetings,

Andres Freund

Attachment Content-Type Size
v4-0001-Improve-ReadRecentBuffer-scalability.patch text/x-diff 6.1 KB
v4-0002-bufmgr-Don-t-lock-buffer-header-in-StrategyGetBuf.patch text/x-diff 11.1 KB
v4-0003-bufmgr-Allow-some-buffer-state-modifications-whil.patch text/x-diff 31.2 KB
v4-0004-bufmgr-Use-atomic-sub-for-unpinning-buffers.patch text/x-diff 2.3 KB
v4-0005-bufmgr-fewer-calls-to-BufferDescriptorGetContentL.patch text/x-diff 7.2 KB
v4-0006-bufmgr-Introduce-FlushUnlockedBuffer.patch text/x-diff 4.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-09-22 22:21:03 Re: Fix wrong filename in header comment of gin_tuple.h
Previous Message Masahiko Sawada 2025-09-22 21:57:46 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart