Re: Buffer locking is special (hints, checksums, AIO writes)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Date: 2025-11-20 02:47:49
Message-ID: 6rgb2nvhyvnszz4ul3wfzlf5rheb2kkwrglthnna7qhe24onwr@vw27225tkyar
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2025-10-09 17:16:49 -0400, Andres Freund wrote:
> On 2025-10-09 16:35:44 -0400, Andres Freund wrote:
> > I pushed a few commits from this patchset after Matthias' review
> > (thanks!). Unfortunately in 5e899859287 I missed that the valgrind annotations
> > would not be done anymore for the buffers returned by
> > StrategyGetBuffer(). Which turned skink red.
> >
> > The attached 0001 patch centralizes the valgrind initialization in
> > TrackNewBufferPin(), which 5e899859287 had added. The nice side effect of that
> > is that there are fewer VALGRIND_MAKE_MEM_DEFINED() calls than before. The
> > naming isn't the perfect match, but it seems fine to me.
>
> Forgot to say: I'll push this patch soon, to get skink back to green. Unless
> somebody says something. We can adjust this later, if the comment and/or
> placement of VALGRIND_MAKE_MEM_DEFINED() isn't to everyones liking.

I have pushed that fix as well as the subsequent buffer header locking changes
a while ago.

Attached is a patchset that actually implements the buffer content locks in
bufmgr.c. This isn't that close to a committable shape yet, but it seemed
useful to get it out there. The first few patches seem closer, so it'll also
be useful to narrow this down.

0001: A straight-up bugfix in lwlock.c - albeit for a bug that seems currently
effectively harmless.

0002: Not really required, but seems like an improvement to me

0003: A prerequisite to 0004, pretty boring itself

0004: Use 64bit atomics for BufferDesc.state - at this point nothing uses the
additional bits yet, though. Some annoying reformatting required to avoid
long lines.

0005: There already was a wait event class for BUFFERPIN. It seems better to
make that more general than to implement them separately.

0006+0007: This is preparatory work for 0008, but also worthwhile on its
own. The private refcount stuff does show up in profiles. The reason it's
related is that without these changes the added information in 0008 makes that
worse.

0008: The main change. Implements buffer content locking independently from
lwlock.c. There's obviously a lot of similarity between lwlock.c code and
this, but I've not found a good way to reduce the duplication without giving
up too much. This patch does immediately introduce share-exclusive as a new
lock level, mostly because it was too painful to do separately.

0009+0010+0011: Preparatory work for 0012.

0012: One of the main goals of this patchset - use the new share-exclusive
lock level to only allow hint bits to be set while no IO is going on.

0013: Prototype of making UnlockReleaseBuffer() faster and of using it more
widely in nbtree.c

0014: Now that hint bits can't be done while IO is going on, we don't need to
copy pages anymore. This needs a fair bit more work, as denoted by the FIXMEs
in the code.

I've tried to add detail to the more important commit messages, at least until
0012.

I want to again emphasize that the important commits (i.e. 0008, 0012, 0014)
aren't close to being mergeable. But I think they're in a stage that they
could benefit from "lenient" high-level review.

Greetings,

Andres Freund

Attachment Content-Type Size
v6-0001-lwlock-Fix-currently-harmless-bug-in-LWLockWakeup.patch text/x-diff 1.5 KB
v6-0002-bufmgr-Turn-BUFFER_LOCK_-into-an-enum.patch text/x-diff 3.1 KB
v6-0003-Add-pg_atomic_unlocked_write_u64.patch text/x-diff 1.8 KB
v6-0004-bufmgr-Change-BufferDesc.state-to-be-a-64bit-atom.patch text/x-diff 43.6 KB
v6-0005-Rename-BUFFERPIN-wait-event-class-to-BUFFER.patch text/x-diff 6.5 KB
v6-0006-bufmgr-Separate-keys-for-private-refcount-infrast.patch text/x-diff 10.0 KB
v6-0007-bufmgr-Add-one-entry-cache-for-private-refcount.patch text/x-diff 1.9 KB
v6-0008-bufmgr-Implement-buffer-content-locks-independent.patch text/x-diff 39.9 KB
v6-0009-heapam-Move-logic-to-handle-HEAP_MOVED-into-a-hel.patch text/x-diff 11.5 KB
v6-0010-heapam-Use-exclusive-lock-on-old-page-in-CLUSTER.patch text/x-diff 2.7 KB
v6-0011-heapam-Add-batch-mode-mvcc-check-and-use-it-in-pa.patch text/x-diff 7.6 KB
v6-0012-Require-share-exclusive-lock-to-set-hint-bits.patch text/x-diff 32.7 KB
v6-0013-WIP-Make-UnlockReleaseBuffer-more-efficient.patch text/x-diff 3.5 KB
v6-0014-WIP-bufmgr-Don-t-copy-pages-while-writing-out.patch text/x-diff 11.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Chao Li 2025-11-20 03:05:26 Re: CREATE/ALTER PUBLICATION improvements for syntax synopsis
Previous Message Chao Li 2025-11-20 02:31:43 Re: Checkpointer write combining