From: Noah Misch Comment on need to MarkBufferDirty() if omitting DELAY_CHKPT_START. Blocking checkpoint phase 2 requires MarkBufferDirty() and BUFFER_LOCK_EXCLUSIVE; neither suffices by itself. transam/README documents this, citing SyncOneBuffer(). Update the DELAY_CHKPT_START documentation to say this. Expand the heap_inplace_update_and_unlock() comment that cites XLogSaveBufferForHint() as precedent, since heap_inplace_update_and_unlock() could have opted not to use DELAY_CHKPT_START. Commit 8e7e672cdaa6bfec85d4d5dd9be84159df23bb41 added DELAY_CHKPT_START to heap_inplace_update_and_unlock(). Since commit bc6bad88572501aecaa2ac5d4bc900ac0fd457d5 reverted it in non-master branches, no back-patch. Discussion: https://postgr.es/m/20250406180054.26.nmisch@google.com diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index ed2e302..c1a4de1 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -6507,9 +6507,17 @@ heap_inplace_update_and_unlock(Relation relation, * [crash] * [recovery restores datfrozenxid w/o relfrozenxid] * - * Like in MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(), copy - * the buffer to the stack before logging. Here, that facilitates a FPI - * of the post-mutation block before we accept other sessions seeing it. + * Mimic MarkBufferDirtyHint() subroutine XLogSaveBufferForHint(). + * Specifically, use DELAY_CHKPT_START, and copy the buffer to the stack. + * The stack copy facilitates a FPI of the post-mutation block before we + * accept other sessions seeing it. DELAY_CHKPT_START allows us to + * XLogInsert() before MarkBufferDirty(). Since XLogSaveBufferForHint() + * can operate under BUFFER_LOCK_SHARED, it can't avoid DELAY_CHKPT_START. + * This function, however, likely could avoid it with the following order + * of operations: MarkBufferDirty(), XLogInsert(), memcpy(). Opt to use + * DELAY_CHKPT_START here, too, as a way to have fewer distinct code + * patterns to analyze. Inplace update isn't so frequent that it should + * pursue the small optimization of skipping DELAY_CHKPT_START. */ Assert((MyProc->delayChkptFlags & DELAY_CHKPT_START) == 0); START_CRIT_SECTION(); diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h index f51b03d..86c5f99 100644 --- a/src/include/storage/proc.h +++ b/src/include/storage/proc.h @@ -110,10 +110,10 @@ extern PGDLLIMPORT int FastPathLockGroupsPerBackend; * is inserted prior to the new redo point, the corresponding data changes will * also be flushed to disk before the checkpoint can complete. (In the * extremely common case where the data being modified is in shared buffers - * and we acquire an exclusive content lock on the relevant buffers before - * writing WAL, this mechanism is not needed, because phase 2 will block - * until we release the content lock and then flush the modified data to - * disk.) + * and we acquire an exclusive content lock and MarkBufferDirty() on the + * relevant buffers before writing WAL, this mechanism is not needed, because + * phase 2 will block until we release the content lock and then flush the + * modified data to disk. See transam/README and SyncOneBuffer().) * * Setting DELAY_CHKPT_COMPLETE prevents the system from moving from phase 2 * to phase 3. This is useful if we are performing a WAL-logged operation that