| From: | Noah Misch <noah(at)leadboat(dot)com> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
| Cc: | Melanie Plageman <melanieplageman(at)gmail(dot)com>, Kirill Reshke <reshkekirill(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
| Subject: | Re: Buffer locking is special (hints, checksums, AIO writes) |
| Date: | 2026-02-15 19:52:39 |
| Message-ID: | 20260215195239.ce.noahmisch@microsoft.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sat, Feb 07, 2026 at 12:44:25PM +0200, Heikki Linnakangas wrote:
> On 03/02/2026 00:33, Andres Freund wrote:
> > - The way MarkBufferDirtyHint() operates was copied into
> > heap_inplace_update_and_unlock(). Now that MarkBufferDirtyHint() won't work
> > that way anymore, it seems better to go with the alternative approach the
> > comments already outlined, namely to only delay updating of the buffer
> > contents.
> >
> > I've done this in a prequisite commit, as it doesn't actually depend on any
> > of the other changes. Noah, any chance you could take a look at this?
v12-0001-heapam-Don-t-mimic-MarkBufferDirtyHint-in-inplac.patch looks good.
> Patch 0001 Looks correct to me. However:
>
> > * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
> > * ["R" is a VACUUM tbl]
> > * D: vac_update_datfrozenxid() -> systable_beginscan(pg_class)
> > * D: systable_getnext() returns pg_class tuple of tbl
> > * R: memcpy() into pg_class tuple of tbl
> > * D: raise pg_database.datfrozenxid, XLogInsert(), finish
> > * [crash]
> > * [recovery restores datfrozenxid w/o relfrozenxid]
> > *
> > * As we hold an exclusive lock - preventing the buffer from being written
> > * out once dirty - we can work around this as follows: MarkBufferDirty(),
> > * XLogInsert(), memcpy().
>
> That last reference to 'memcpy' is a little orphaned now. The comment used
> to talk about the stack copy of the page, but now there's no mention of that
> except for this reference to memcpy(). To make things worse, the steps have
> "memcpy() into pg_class tuple of tbl", so one could think that the "memcpy"
> refers to that.
"memcpy" does refer to "memcpy() into pg_class tuple of tbl", so I don't see
that as orphaned. Nonetheless:
> How about this:
>
> * We avoid that by using a temporary copy of the buffer to hide our
> * change from other backends until it's been WAL-logged. We apply our
> * change to the temporary copy and WAL-log it before modifying the real
> * page. That way any action a reader of the in-place-updated value takes
> * will be WAL logged after this change.
Either v12 or v12 w/ this edit is fine with me. I find this proposed text
redundant with nearby comment "register block matching what buffer will look
like after changes", so I mildly prefer v12.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-02-15 19:56:07 | Inconsistency in installation of syscache_info.h |
| Previous Message | Andres Freund | 2026-02-15 19:38:40 | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |