[PoC] Umbra: a remap-aware smgr prototype on PostgreSQL master

From: Mingwei Jia <i(at)nayishan(dot)top>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: i(at)nayishan(dot)top
Subject: [PoC] Umbra: a remap-aware smgr prototype on PostgreSQL master
Date: 2026-04-24 14:12:02
Message-ID: a8d4e49b-7f20-435f-8555-7907554fedaf@nayishan.top
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

  Hi hackers,

  Apologies if my earlier attempt did not reach the list correctly. I
am sending this as a single PoC introduction with repository links only,
rather than as an attached patch series.

  I would like to share a working Proof-of-Concept for Umbra, an
alternative smgr implementation on PostgreSQL master.

  To be clear about scope: this is not a merge-ready proposal, and it
is not a new table AM or a separate storage engine. The goal is
narrower: to make the current design, code structure, recovery model, and
  patch decomposition concrete enough for technical discussion, and to
preserve a usable baseline for anyone interested in continuing the work.

  Umbra operates at the smgr layer. The central idea is to decouple
logical page identity from physical page placement, so that the ordinary
first-dirty-after-checkpoint path does not have to rely on
  PostgreSQL's default full-page-image path in the same way. In the
current prototype:

  - PostgreSQL callers still work in logical block numbers.
  - Umbra maintains lblk -> pblk translation in its own metadata fork.
  - WAL can publish remap state explicitly.
  - redo reconstructs the correct mapping view before replaying page
contents.

  Umbra's metadata fork contains only two formats: a 512-byte
superblock for fork-level control state, and single-purpose MAP pages
for mapping entries. These are not ordinary heap/index pages. In that
  respect they are closer to system control/state metadata such as
pg_control and pg_xact/SLRU pages, and they do not rely on PostgreSQL's
ordinary FPW path for data pages. Instead, they are protected by
  Umbra-specific metadata WAL/redo rules for those two formats.

  The implementation is currently organized in the repository as:

  - P0: design notes and repository navigation
  - P1-P9: code patches covering smgr boundary, metadata fork, MAP
subsystem, WAL/redo, checkpoint integration, preallocation, and compaction

  Current verification state:

  - final tip passes `make check`
  - final tip passes `make -C src/test/recovery check`
  - strict per-patch state is:
    - P1-P5: all four matrix items pass
    - P6: MD make check / MD recovery / UMBRA make check pass, but
UMBRA recovery does not pass
    - P7-P9: all four matrix items pass

  That boundary is intentional in the current decomposition: P6
establishes the WAL record / birth / basic redo state-machine layer,
while P7 closes the ordinary remap / block-reference remap / checkpoint-
  boundary replacement loop.

  I do not want to overclaim on performance. The numbers below should
be read as directional PoC signals, not as a final benchmark claim.

  On a TPC-C-style workload (BenchmarkSQL), the current results are:

  Throughput (`checksum=off`)

  terminals | md + fpw=on | md + fpw=off | Umbra + fpw=on
  ----------+-------------+--------------+----------------
  10        |      158709 |       154283 |         155781
  50        |      577005 |       626954 |         656353
  200       |      641899 |       981436 |         995635
  500       |      322660 |       943295 |         859058
  1000      |      275609 |       899631 |         729989

  Throughput (`checksum=on`)

  terminals | md + fpw=on | md + fpw=off | Umbra + fpw=on
  ----------+-------------+--------------+----------------
  10        |      155754 |       152025 |         150606
  50        |      601974 |       635597 |         650844
  200       |      621176 |      1015923 |         938311
  500       |      316950 |       972795 |         729801
  1000      |      282713 |       891770 |         674865

  WAL size ratio (`md + fpw=on` / `Umbra + fpw=on`)

  terminals | checksum=on | checksum=off
  ----------+-------------+--------------
  10        |        1.82 |         2.03
  50        |        2.11 |         2.51
  200       |        3.81 |         5.22
  500       |        4.58 |         6.90
  1000      |        4.87 |         6.55

  At 1000 terminals, Umbra recovers roughly 85% of the throughput gap
between `md + fpw=on` and `md + fpw=off`, while reducing WAL volume by
roughly 4.9x (`checksum=on`) or 6.6x (`checksum=off`).

  The `md + fpw=off` numbers should be read only as a sensitivity /
upper-bound reference, not as a correctness-equivalent baseline.

  Known follow-up work still includes:

  - deeper host-tree engineering around AIO
  - `CREATE DATABASE` `WAL_LOG` copy path
  - stronger primary/standby physical-page alignment validation
  - more complete production-grade space management
  - an explicit upper-layer owner model for `range-born / batch mapping
publish`

  The last point is worth calling out explicitly: the current prototype
has internal range-shaped lifecycle operations, but it does not yet
claim a generic upper-layer `RangeMap` contract. I do not believe
  that should be introduced without a clear upper-layer use site and
owner model.

  For personal reasons, my availability for sustained follow-up may be
limited for some time. Rather than leave this work in a private or
half-documented state, I would prefer to put the current PoC and
  design notes in front of the community while they are still coherent
and runnable.

  If the direction looks interesting, I would welcome discussion,
criticism, or a future maintainer/collaborator willing to continue the
engineering work from this baseline.

  Repository and design notes:

https://github.com/nayishan/postgre_umbra/tree/umbra-poc-pgmaster

  Regards,
  Mingwei Jia
  i(at)nayishan(dot)top

Browse pgsql-hackers by date

  From Date Subject
Next Message Bertrand Drouvot 2026-04-24 14:18:21 Re: Fix DROP PROPERTY GRAPH "unsupported object class" error
Previous Message Greg Lamberson 2026-04-24 14:07:32 Re: Comments on Custom RMGRs