Re: Buffer locking is special (hints, checksums, AIO writes)

From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Date: 2026-02-15 19:52:39
Message-ID: 20260215195239.ce.noahmisch@microsoft.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Feb 07, 2026 at 12:44:25PM +0200, Heikki Linnakangas wrote:
> On 03/02/2026 00:33, Andres Freund wrote:
> > - The way MarkBufferDirtyHint() operates was copied into
> > heap_inplace_update_and_unlock(). Now that MarkBufferDirtyHint() won't work
> > that way anymore, it seems better to go with the alternative approach the
> > comments already outlined, namely to only delay updating of the buffer
> > contents.
> >
> > I've done this in a prequisite commit, as it doesn't actually depend on any
> > of the other changes. Noah, any chance you could take a look at this?

v12-0001-heapam-Don-t-mimic-MarkBufferDirtyHint-in-inplac.patch looks good.

> Patch 0001 Looks correct to me. However:
>
> > * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
> > * ["R" is a VACUUM tbl]
> > * D: vac_update_datfrozenxid() -> systable_beginscan(pg_class)
> > * D: systable_getnext() returns pg_class tuple of tbl
> > * R: memcpy() into pg_class tuple of tbl
> > * D: raise pg_database.datfrozenxid, XLogInsert(), finish
> > * [crash]
> > * [recovery restores datfrozenxid w/o relfrozenxid]
> > *
> > * As we hold an exclusive lock - preventing the buffer from being written
> > * out once dirty - we can work around this as follows: MarkBufferDirty(),
> > * XLogInsert(), memcpy().
>
> That last reference to 'memcpy' is a little orphaned now. The comment used
> to talk about the stack copy of the page, but now there's no mention of that
> except for this reference to memcpy(). To make things worse, the steps have
> "memcpy() into pg_class tuple of tbl", so one could think that the "memcpy"
> refers to that.

"memcpy" does refer to "memcpy() into pg_class tuple of tbl", so I don't see
that as orphaned. Nonetheless:

> How about this:
>
> * We avoid that by using a temporary copy of the buffer to hide our
> * change from other backends until it's been WAL-logged. We apply our
> * change to the temporary copy and WAL-log it before modifying the real
> * page. That way any action a reader of the in-place-updated value takes
> * will be WAL logged after this change.

Either v12 or v12 w/ this edit is fine with me. I find this proposed text
redundant with nearby comment "register block matching what buffer will look
like after changes", so I mildly prefer v12.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-02-15 19:56:07 Inconsistency in installation of syscache_info.h
Previous Message Andres Freund 2026-02-15 19:38:40 Re: POC: enable logical decoding when wal_level = 'replica' without a server restart