| From: | Mingwei Jia <i(at)nayishan(dot)top> |
|---|---|
| To: | Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
| Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Re: [RFC PATCH v2 RESEND 04/10] umbra: add patch 3 metadata disk format and identity mapping bootstrap |
| Date: | 2026-06-03 15:40:18 |
| Message-ID: | 4ca9becd-4ebb-4222-a5ae-f51d6eb34aa7@nayishan.top |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Álvaro, all,
Thanks for the comments and help so far.
First, thanks to Tom, Bruce, and Robert for helping me get the
submission format into better shape. I will keep using the tar.gz
attachment format for future versions. Robert, I agree that the current
per-patch subjects are not enough. In v3 I will at least add real commit
messages to each patch, so that each patch explains what it adds and why
it is split that way.
Álvaro, I think your fork-abstraction point is a good question. It made
me think more about what the common code should provide and what the
owner of relation-local storage should provide.
Looking at the patch again, I think part of the problem is that I mixed
too many things into the smgr interface. There are lifecycle hooks,
runtime mapping calls, WAL/redo calls, background maintenance, and a few
statistics counters. That probably makes the code harder to understand
than it needs to be.
smgrisinternalfork() is one example where the boundary is not clean
enough. I am also not sure that the map statistics counters should be in
the smgr-facing API.
So before trying to solve the much larger owner-defined fork problem, I
think I should first clean up the Umbra smgr interface. The smgr may
still be the right owner for the lblk-to-pblk metadata, because code
above smgr should not know physical block numbers. But the way the
prototype exposes that metadata today needs work.
I also do not see the current smgr placement only as a shortcut around
the fork abstraction problem. The lblk-to-pblk map is part of the
physical placement policy of the storage manager, not table-AM or
index-AM contents. Even if PostgreSQL eventually has a more general
owner-defined fork facility, I think this particular kind of metadata may
still naturally be owned by the smgr implementation.
One complication is buffering. The metadata fork adds a separate shared
cache for MAP pages, outside PostgreSQL's normal buffer manager for
relation pages. That follows from making the remapped smgr own the
mapping metadata: the MAP cache stores translation metadata, while the
normal buffer manager continues to cache relation pages by logical block
identity. This is another reason why the owner-defined fork question
looks larger than just allowing an AM to declare more forks.
The reason Umbra needs this metadata is to make the logical/physical
split durable and redoable. For some ordinary updates after checkpoint,
PostgreSQL normally needs a full-page image in WAL because redo needs a
safe page image to start from. Umbra tries to replace that inline WAL
page image, in eligible cases, with a preserved old physical page.
In that model, old_pblk is the content baseline and new_pblk is the
WAL-owned physical target. WAL records the old physical block, the new
physical block, and the resulting mapping state. During redo, the old
physical page can be used as the baseline, the WAL delta is applied, and
the reconstructed page is written to the new physical location.
So the idea is not to disable full_page_writes globally. It is to move
the recovery baseline from an inline full-page image in WAL to an old
physical page preserved by the remap and reclaim machinery.
For v3, the immediate change I will make is to add real commit messages
to every patch. I will also make the cover letter point more directly to
the relevant design notes, especially the smgr-private metadata boundary
and the intended review scope.
I will also take the comments about the smgr-facing interfaces being hard
to understand into account. Larger changes to the interface shape, patch
boundaries, or code structure will need separate follow-up work, and I
expect to improve those parts incrementally. I will try to make the next
version easier to review.
Thanks again for the help so far. I think this direction is worth
exploring, and I would be very happy to keep working through the open
questions with others and see whether this approach can be made to work.
Regards,
Mingwei
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniil Davydov | 2026-06-03 15:51:34 | Re: BUG with accessing to temporary tables of other sessions still exists |
| Previous Message | Ewan Young | 2026-06-03 15:39:05 | Use ereport() instead of elog() for invalid weights in setweight() |