[RFC PATCH v2 RESEND 0/10] Umbra: a remap-aware smgr prototype

From: Mingwei Jia <i(at)nayishan(dot)top>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [RFC PATCH v2 RESEND 0/10] Umbra: a remap-aware smgr prototype
Date: 2026-06-01 23:22:42
Message-ID: 20260601232242.67658-1-i@nayishan.top
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

This is a RESEND of the RFC v2 Umbra patch series, sent as a standard
threaded patch series.

The previous v2 attempt was sent as a single cover-letter message with
10 patch attachments. It was held for moderation and did not appear in
the public pgsql-hackers archives. This resend keeps the same code and
review scope, but sends the patches in the usual 0/10 .. 10/10 form.

This is a follow-up to my previous PoC note about Umbra, sent on
April 24, 2026:

https://www.mail-archive.com/pgsql-hackers%40lists.postgresql.org/msg227384.html

Umbra is an smgr-layer prototype on PostgreSQL master. It is not a
table AM proposal, and it does not try to introduce a separate storage
engine abstraction.

The central idea is still the same as in the previous note: decouple
logical block identity from physical page placement by maintaining
lblk -> pblk translation in Umbra metadata, so that ordinary data-page
updates after checkpoint do not have to rely on PostgreSQL's default
full-page-image path in the same way.

This is an RFC / proof-of-concept patch series, not a merge-ready
submission. I am mainly looking for feedback on these questions:

1. Umbra is built around the metadata fork. Is the current metadata-fork
layout and update protocol a plausible way to avoid a recursive
dependency on PostgreSQL's ordinary full-page-write mechanism? In
other words, can the metadata fork itself remain crash-safe without
introducing nested FPW requirements?

2. Is the smgr layer an acceptable boundary for this experiment, or
should this idea be discussed at a different layer?

3. Is the WAL / remap / redo correctness model conceptually sound,
especially the way remap state is published to WAL and reconstructed
before replaying page contents?

4. Should the patch series be split differently before deeper review?
If so, which part would make the best first independently reviewable
patch?

The foreground/background split is also unchanged from the previous
description. The current PoC deliberately uses a conservative reclaim
policy: foreground paths allocate physical pages monotonically and avoid
synchronous reclaim, defragmentation, or relocation in user write paths.
mapwriter handles MAP-page flushing and physical preallocation, while
mapcompactor handles longer-term reclaim and compaction work.

This should not be read as claiming that Umbra's overall correctness
model is simple. The point is narrower: the conservative reclaim policy
reduces the number of foreground/checkpoint/WAL-redo/reclaim
interleavings that the prototype has to support initially, while leaving
more aggressive space-convergence work to later engineering.

The verification state is the same as described in the previous note:
the final tip has passed `make check` and
`make -C src/test/recovery check` in both regular and `--with-umbra`
builds. The strict per-patch boundary remains:

- P1-P5: all four matrix items pass
- P6: MD make check / MD recovery / UMBRA make check pass, but UMBRA
recovery does not pass at that boundary
- P7-P9: all four matrix items pass

That boundary is intentional in the current decomposition: P6 establishes
the WAL record / birth / basic redo state-machine layer, while P7 closes
the ordinary remap / block-reference remap / checkpoint-boundary
replacement loop.

The performance numbers from the previous note should still be read only
as directional PoC signals, not as final benchmark claims. The
`md + full_page_writes=off` numbers are only a sensitivity / upper-bound
reference, not a correctness-equivalent baseline.

The repository remains available here as a supplementary reference:

https://github.com/nayishan/postgre_umbra/tree/umbra-poc-pgmaster

Mingwei Jia (10):
umbra: add patch 0 design notes and repository navigation
umbra: add patch 1 smgr implementation boundary
umbra: add patch 2 umfile physical file manager and metadata storage
primitives
umbra: add patch 3 metadata disk format and identity mapping bootstrap
umbra: add patch 4 shared-memory MAP cache and checkpoint flush
umbra: add patch 5 MAP access policy, translation, and materialization
umbra: add patch 6 WAL records, mapped birth, and redo state machine
umbra: add patch 7 checkpoint-boundary FPW replacement and
block-reference remap
umbra: add patch 8 checkpoint/mapwriter writeback and physical
preallocation
umbra: add patch 9 compactor framework and non-interference policy

README.md | 261 +-
README_ZH.md | 241 ++
configure | 38 +
configure.ac | 10 +
doc/umbra/ARCHITECTURE.md | 437 +++
doc/umbra/ARCHITECTURE_ZH.md | 282 ++
doc/umbra/PROTOTYPE.md | 86 +
doc/umbra/PROTOTYPE_ZH.md | 74 +
doc/umbra/REVIEW_GUIDE.md | 210 ++
doc/umbra/REVIEW_GUIDE_ZH.md | 133 +
doc/umbra/UMBRA_FPW_STORY.md | 708 +++++
doc/umbra/UMBRA_FPW_STORY_ZH.md | 500 ++++
doc/umbra/WAL_AND_REDO.md | 419 +++
doc/umbra/WAL_AND_REDO_ZH.md | 248 ++
meson.build | 1 +
meson_options.txt | 3 +
src/Makefile.global.in | 1 +
src/backend/access/brin/brin.c | 2 +-
src/backend/access/brin/brin_pageops.c | 4 +-
src/backend/access/brin/brin_revmap.c | 2 +-
src/backend/access/gin/gindatapage.c | 2 +-
src/backend/access/gin/ginfast.c | 8 +-
src/backend/access/gin/ginutil.c | 2 +-
src/backend/access/gist/gistxlog.c | 2 +-
src/backend/access/hash/hashovfl.c | 4 +-
src/backend/access/hash/hashpage.c | 16 +-
src/backend/access/heap/heapam.c | 6 +-
src/backend/access/heap/heapam_handler.c | 10 +-
src/backend/access/nbtree/nbtinsert.c | 8 +-
src/backend/access/nbtree/nbtpage.c | 14 +-
src/backend/access/rmgrdesc/Makefile | 5 +
src/backend/access/rmgrdesc/meson.build | 6 +
src/backend/access/rmgrdesc/umbradesc.c | 116 +
src/backend/access/rmgrdesc/xlogdesc.c | 25 +
src/backend/access/spgist/spgdoinsert.c | 14 +-
src/backend/access/transam/Makefile | 5 +
src/backend/access/transam/meson.build | 6 +
src/backend/access/transam/rmgr.c | 3 +
src/backend/access/transam/umbra_xlog.c | 366 +++
src/backend/access/transam/xlog.c | 6 +
src/backend/access/transam/xloginsert.c | 744 ++++-
src/backend/access/transam/xlogreader.c | 40 +
src/backend/access/transam/xlogutils.c | 560 +++-
src/backend/backup/basebackup.c | 22 +-
src/backend/catalog/storage.c | 198 +-
src/backend/commands/dbcommands.c | 19 +
src/backend/commands/sequence.c | 6 +-
src/backend/commands/tablecmds.c | 11 +-
src/backend/common.mk | 2 +-
src/backend/postmaster/Makefile | 6 +
src/backend/postmaster/bgworker.c | 14 +
src/backend/postmaster/mapcompactor.c | 151 +
src/backend/postmaster/mapwriter.c | 198 ++
src/backend/postmaster/meson.build | 7 +
src/backend/postmaster/postmaster.c | 7 +
src/backend/storage/Makefile | 5 +
src/backend/storage/buffer/bufmgr.c | 14 +-
src/backend/storage/map/Makefile | 25 +
src/backend/storage/map/map.c | 1547 ++++++++++
src/backend/storage/map/mapbgproc.c | 1063 +++++++
src/backend/storage/map/mapbuf.c | 428 +++
src/backend/storage/map/mapclock.c | 464 +++
src/backend/storage/map/mapflush.c | 665 +++++
src/backend/storage/map/mapinflight.c | 402 +++
src/backend/storage/map/mapinit.c | 239 ++
src/backend/storage/map/mapsuper.c | 1789 +++++++++++
src/backend/storage/map/meson.build | 12 +
src/backend/storage/meson.build | 3 +
src/backend/storage/smgr/Makefile | 9 +
src/backend/storage/smgr/bulk_write.c | 53 +-
src/backend/storage/smgr/md.c | 1 +
src/backend/storage/smgr/meson.build | 7 +
src/backend/storage/smgr/smgr.c | 359 ++-
src/backend/storage/smgr/umbra.c | 2659 +++++++++++++++++
src/backend/storage/smgr/umfile.c | 2613 ++++++++++++++++
src/backend/storage/sync/sync.c | 113 +-
.../utils/activity/wait_event_names.txt | 5 +
src/backend/utils/adt/dbsize.c | 14 +-
src/backend/utils/adt/pgstatfuncs.c | 25 +
src/backend/utils/cache/relcache.c | 12 +-
src/backend/utils/init/postinit.c | 8 +
src/backend/utils/misc/guc_parameters.dat | 202 ++
src/backend/utils/misc/guc_tables.c | 2 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/bin/pg_waldump/.gitignore | 1 +
src/bin/pg_waldump/Makefile | 9 +
src/bin/pg_waldump/rmgrdesc.c | 3 +
src/include/access/rmgrlist.h | 3 +
src/include/access/umbra_xlog.h | 109 +
src/include/access/xloginsert.h | 4 +
src/include/access/xlogreader.h | 11 +
src/include/access/xlogrecord.h | 49 +
src/include/access/xlogutils.h | 3 +
src/include/catalog/pg_proc.dat | 20 +
src/include/catalog/storage.h | 1 +
src/include/pg_config.h.in | 3 +
src/include/postmaster/mapwriter.h | 28 +
src/include/storage/aio_types.h | 3 +-
src/include/storage/lwlocklist.h | 1 +
src/include/storage/map.h | 323 ++
src/include/storage/map_internal.h | 59 +
src/include/storage/mapsuper.h | 100 +
src/include/storage/mapsuper_internal.h | 174 ++
src/include/storage/smgr.h | 37 +-
src/include/storage/subsystemlist.h | 3 +
src/include/storage/sync.h | 2 +
src/include/storage/um_defs.h | 51 +
src/include/storage/umbra.h | 179 ++
src/include/storage/umfile.h | 122 +
src/test/recovery/meson.build | 22 +
.../t/053_umbra_map_superblock_watermark.pl | 104 +
.../recovery/t/054_umbra_map_fork_policy.pl | 62 +
.../t/055_umbra_mapwriter_activity.pl | 56 +
.../t/056_umbra_truncate_superblock.pl | 82 +
.../t/057_umbra_remap_crash_consistency.pl | 74 +
.../t/058_umbra_2pc_remap_recovery.pl | 90 +
.../t/059_umbra_compactor_relocation.pl | 91 +
.../060_umbra_reclaim_checkpoint_counters.pl | 82 +
.../t/061_umbra_fsm_vm_map_translation.pl | 117 +
.../t/062_umbra_truncate_drop_crash_matrix.pl | 108 +
...3_umbra_mainfork_head_unlink_checkpoint.pl | 60 +
...64_umbra_mainfork_internal_reclaim_seg0.pl | 283 ++
...umbra_mainfork_middle_reclaim_keep_seg0.pl | 356 +++
.../recovery/t/066_umbra_truncate_redo.pl | 64 +
src/test/recovery/t/067_umbra_remap_redo.pl | 90 +
...68_umbra_old_baseline_checkpoint_window.pl | 85 +
.../t/069_umbra_range_remap_zeroextend.pl | 101 +
.../t/070_umbra_hash_birth_block_remap.pl | 66 +
.../t/071_umbra_skip_wal_dense_map.pl | 65 +
.../t/072_umbra_ordinary_slim_block_remap.pl | 69 +
.../recovery/t/073_umbra_preallocate_guc.pl | 74 +
.../recovery/t/074_umbra_torn_page_remap.pl | 261 ++
132 files changed, 22588 insertions(+), 181 deletions(-)
create mode 100644 README_ZH.md
create mode 100644 doc/umbra/ARCHITECTURE.md
create mode 100644 doc/umbra/ARCHITECTURE_ZH.md
create mode 100644 doc/umbra/PROTOTYPE.md
create mode 100644 doc/umbra/PROTOTYPE_ZH.md
create mode 100644 doc/umbra/REVIEW_GUIDE.md
create mode 100644 doc/umbra/REVIEW_GUIDE_ZH.md
create mode 100644 doc/umbra/UMBRA_FPW_STORY.md
create mode 100644 doc/umbra/UMBRA_FPW_STORY_ZH.md
create mode 100644 doc/umbra/WAL_AND_REDO.md
create mode 100644 doc/umbra/WAL_AND_REDO_ZH.md
create mode 100644 src/backend/access/rmgrdesc/umbradesc.c
create mode 100644 src/backend/access/transam/umbra_xlog.c
create mode 100644 src/backend/postmaster/mapcompactor.c
create mode 100644 src/backend/postmaster/mapwriter.c
create mode 100644 src/backend/storage/map/Makefile
create mode 100644 src/backend/storage/map/map.c
create mode 100644 src/backend/storage/map/mapbgproc.c
create mode 100644 src/backend/storage/map/mapbuf.c
create mode 100644 src/backend/storage/map/mapclock.c
create mode 100644 src/backend/storage/map/mapflush.c
create mode 100644 src/backend/storage/map/mapinflight.c
create mode 100644 src/backend/storage/map/mapinit.c
create mode 100644 src/backend/storage/map/mapsuper.c
create mode 100644 src/backend/storage/map/meson.build
create mode 100644 src/backend/storage/smgr/umbra.c
create mode 100644 src/backend/storage/smgr/umfile.c
create mode 100644 src/include/access/umbra_xlog.h
create mode 100644 src/include/postmaster/mapwriter.h
create mode 100644 src/include/storage/map.h
create mode 100644 src/include/storage/map_internal.h
create mode 100644 src/include/storage/mapsuper.h
create mode 100644 src/include/storage/mapsuper_internal.h
create mode 100644 src/include/storage/um_defs.h
create mode 100644 src/include/storage/umbra.h
create mode 100644 src/include/storage/umfile.h
create mode 100644 src/test/recovery/t/053_umbra_map_superblock_watermark.pl
create mode 100644 src/test/recovery/t/054_umbra_map_fork_policy.pl
create mode 100644 src/test/recovery/t/055_umbra_mapwriter_activity.pl
create mode 100644 src/test/recovery/t/056_umbra_truncate_superblock.pl
create mode 100644 src/test/recovery/t/057_umbra_remap_crash_consistency.pl
create mode 100644 src/test/recovery/t/058_umbra_2pc_remap_recovery.pl
create mode 100644 src/test/recovery/t/059_umbra_compactor_relocation.pl
create mode 100644 src/test/recovery/t/060_umbra_reclaim_checkpoint_counters.pl
create mode 100644 src/test/recovery/t/061_umbra_fsm_vm_map_translation.pl
create mode 100644 src/test/recovery/t/062_umbra_truncate_drop_crash_matrix.pl
create mode 100644 src/test/recovery/t/063_umbra_mainfork_head_unlink_checkpoint.pl
create mode 100644 src/test/recovery/t/064_umbra_mainfork_internal_reclaim_seg0.pl
create mode 100644 src/test/recovery/t/065_umbra_mainfork_middle_reclaim_keep_seg0.pl
create mode 100644 src/test/recovery/t/066_umbra_truncate_redo.pl
create mode 100644 src/test/recovery/t/067_umbra_remap_redo.pl
create mode 100644 src/test/recovery/t/068_umbra_old_baseline_checkpoint_window.pl
create mode 100644 src/test/recovery/t/069_umbra_range_remap_zeroextend.pl
create mode 100644 src/test/recovery/t/070_umbra_hash_birth_block_remap.pl
create mode 100644 src/test/recovery/t/071_umbra_skip_wal_dense_map.pl
create mode 100644 src/test/recovery/t/072_umbra_ordinary_slim_block_remap.pl
create mode 100644 src/test/recovery/t/073_umbra_preallocate_guc.pl
create mode 100644 src/test/recovery/t/074_umbra_torn_page_remap.pl

--
2.50.1 (Apple Git-155)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Srinivas Kumar 2026-06-01 23:27:17 Re: DBeaver Experiencing timeouts while connecting to New Linux PostgreSql server
Previous Message Tomas Vondra 2026-06-01 23:07:34 Re: DBeaver Experiencing timeouts while connecting to New Linux PostgreSql server