Re: Buffer locking is special (hints, checksums, AIO writes)

From: Andres Freund <andres(at)anarazel(dot)de>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Date: 2026-01-09 00:29:35
Message-ID: 4csodkvvfbfloxxjlkgsnl2lgfv2mtzdl7phqzd4jxjadxm4o5@usw7feyb5bzf
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I pushed what was 0001, 0002 in v8. Attached is an updated set of patches for
the rest.

Main changes:

- the explanation in "heapam: Use exclusive lock on old page in CLUSTER" as to
why it's problematic to not set hint bits wasn't quite right. I've updated
it:

heapam: Use exclusive lock on old page in CLUSTER

To be able to guarantee that we can set the hint bit, acquire an exclusive
lock on the old buffer. This is required as a future commit will only allow
hint bits to be set with a new lock level, which is acquired as-needed in a
non-blocking fashion.

We need the hint bits, set in heapam_relation_copy_for_cluster() ->
HeapTupleSatisfiesVacuum(), to be set, as otherwise reform_and_rewrite_tuple()
-> rewrite_heap_tuple() will get confused. Specifically, rewrite_heap_tuple()
checks for HEAP_XMAX_INVALID in the old tuple to determine whether to check
the old-to-new mapping hash table.

- I added a patch that inverts the meaning of LW_FLAG_RELEASE_OK, to make the
equivalent code for content locks easier. For buffer content locks we reset
the flags when invalidating, and otherwise we'd either need to not have the
equivalent of LW_FLAG_RELEASE_OK in the flag mask or explicitly add it after
making the buffer valid.

I think it's also nicer this way round, because we e.g. can assert that
there are no pending wakeups when invalidating a buffer.

- I added a patch to reorganize some of the flags stuff in buf_internals.h, to
make the later patches cleaner. In particular flags are now defined with a
macro so that changing at which offset flag bits are doesn't require
touching every single flag value.

- For the main commit, the reorganized flag stuff removed one of the remaining
FIXMEs.

- I removed the performance instrumentation stuff from the batch visibility
commit.

I think 0001, 0002, 0003 can be committed. 0004, 0005 are new and probably
could use a sanity check. 0006 hasn't changed much and is imo pretty much
ready, but should be pushed together with 0007. 0007 is getting close, I
think. 0008-0010 need a bit more work, but I think that can wait until 0007
has been pushed.

For 0008, it'd be nice if somebody could look at the way buf_internals.h now
looks.

The only remaining FIXME in 0008 is about the the reuse of
PGPROC->{lwWaiting,lwWaitMode,lwWaitLink}. I think reusing them for content
locks isn't pretty, but it's probably not worth duplicating them. Thoughts?

Greetings,

Andres Freund

Attachment Content-Type Size
v9-0001-freespace-Don-t-modify-page-without-any-lock.patch text/x-diff 2.0 KB
v9-0002-heapam-Use-exclusive-lock-on-old-page-in-CLUSTER.patch text/x-diff 3.9 KB
v9-0003-heapam-Add-batch-mode-mvcc-check-and-use-it-in-pa.patch text/x-diff 7.5 KB
v9-0004-lwlock-Invert-meaning-of-LW_FLAG_RELEASE_OK.patch text/x-diff 5.5 KB
v9-0005-bufmgr-Make-definitions-related-to-buffer-descrip.patch text/x-diff 4.5 KB
v9-0006-bufmgr-Change-BufferDesc.state-to-be-a-64bit-atom.patch text/x-diff 45.1 KB
v9-0007-bufmgr-Implement-buffer-content-locks-independent.patch text/x-diff 46.0 KB
v9-0008-Require-share-exclusive-lock-to-set-hint-bits-and.patch text/x-diff 37.8 KB
v9-0009-WIP-Make-UnlockReleaseBuffer-more-efficient.patch text/x-diff 3.5 KB
v9-0010-WIP-bufmgr-Don-t-copy-pages-while-writing-out.patch text/x-diff 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2026-01-09 00:32:07 Re: Use IsA() macro instead of nodeTag comparison
Previous Message Michael Paquier 2026-01-09 00:25:01 Re: [PATCH] Add pg_current_vxact_id() function to expose virtual transaction IDs