Quick Links

Re: RFC: PostgreSQL Storage I/O Transformation Hooks

From:	Henson Choi <assam258(at)gmail(dot)com>
To:	Tomas Vondra <tomas(at)vondra(dot)me>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Subject:	Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Date:	2025-12-30 02:37:20
Message-ID:	CAAAe_zAAp_7N=c+imUrmBw09NZ41SEgqcEWhtszui66N=qtjMA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2025년 12월 30일 (화) AM 10:19, Tomas Vondra <tomas(at)vondra(dot)me>님이 작성:

> Please don't top-post. We generally prefer to reply in-line, which makes
> it easier to follow the discussion. With top-posting I have to seek what
> are you responding to.
>

Apologies for the formatting error. I'll follow inline-reply from now on.

> On 12/29/25 03:35, Henson Choi wrote:
> > Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
> >
> > Hi Tomas,
> >
> > Thank you for this critical feedback. Your concerns go to the heart of
> > the proposal's viability, and I appreciate your directness.
> >
> >
> > 1. Multiple Extensions and Hook Chaining
> >
> > You're right to question this. To be honest, I have significant doubts
> > about allowing multiple transformation extensions simultaneously.
> >
> > The Transform ID coordination problem is real: without a registry or
> > protocol between extensions, they cannot cooperate safely. Hook chaining
> > for read/write operations might work (extension A encrypts, extension B
> > compresses), but the Transform ID field creates conflicts.
> >
> > Perhaps I should be more direct: transformation hook chaining is not
> > realistically possible with the current design. TDE extensions would
> > need exclusive use of these hooks. This is a fundamental limitation I
> > should have stated clearly in the RFC.
> >
>
> Isn't that just another argument against using hooks? Chaining is what
> hooks do, and there's no protection against a hook being set by multiple
> extensions.
>

You're absolutely right. As I mentioned in my reply to Zsolt, I'm stepping
back
from the hook approach to study the SMGR extensibility work first.

The chaining limitation you pointed out is fundamental - if TDE requires
exclusive access, then hooks are the wrong mechanism. I should have
reviewed
existing SMGR extensibility efforts before proposing hooks.

>
> >
> > 2. pd_flags Reservation - I Hope You'll Consider This
> >
> > I understand your concern about reserving pd_flags bits for extensions.
> > However, I'd like to ask you to consider the reasoning behind this
> choice.
> >
> > The 5-bit Transform ID serves a critical purpose: it allows the core to
> > identify the page's transformation state without attempting decryption.
> > This is important for:
> >
> > - Error reporting: "This page is encrypted with transform ID 5, but no
> > extension is loaded to handle it"
> > - Migration safety: Distinguishing between untransformed pages (ID=0)
> > and transformed pages during gradual encryption
> > - Crash recovery: The core can detect transformation state
> inconsistencies
> >
> > That said, I recognize pd_flags is precious and limited. Let me propose
> > an alternative approach that might better align with core principles:
> >
>
> The information may be crucial, but pd_flags is simply not meant to be
> used by extensions to store custom data.
>
>
Understood. I see now why this is a non-starter.

> > Instead of extension-specific Transform IDs, what if we allow extensions
> > to reserve space at pd_upper (similar to how special space works at
> > pd_special)?
> >
> > The core could manage a small flag (2-3 bits) indicating "N bytes at
> > pd_upper are reserved for transformation metadata". By encoding N as
> > multiples of 2 or 4 bytes, we maximize the flag's efficiency:
> >
> > - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most
> cases)
> > - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable
> needs)
> > - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
> >
> > This approach uses minimal pd_flags bits while providing substantial
> > metadata space. It would:
> >
> > - Keep the flag in core control (not extension-specific)
> > - Allow extensions to store IV, authentication tags, key version, etc.
> > in a standardized location
> > - Be self-describing (the flag tells you how much space is reserved)
> > - Generalize beyond encryption (compression, checksums, etc. could use
> it)
> >
> > In our internal implementation, we actually add opaque bytes to
> > PageHeader for encryption metadata. This pd_upper approach could
> > formalize that pattern for extensions.
> >
> > I believe some form of page-level metadata for transformations is
> > necessary. Would either approach (Transform ID or pd_upper reservation)
> > be acceptable with the right design, or do you see fundamental issues
> > with page-level transformation metadata itself?
> >
>
> AFAICS this is pretty much exactly what this patch aimed to do (also to
> allow implementing TDE):
>
> https://commitfest.postgresql.org/patch/3986/
>
> Clearly, it's not as simple as it may seem, otherwise the patch would
> not be WIP for 3 years.
>

Thank you - this is exactly what I needed to see. Combined with Zsolt's
pointer to
the SMGR patch already in production, I clearly should have done this
research
before proposing. I'll study both: the working SMGR solution and why patch
3986
has been WIP for 3 years. That should give me proper context.

> >
> > 3. Maintenance Burden and Test Coverage
> >
> > I deeply appreciate this concern. Having worked across various DBMS
> > implementations, I've seen solution vendors ship without comprehensive
> > regression testing - but never a database vendor. DBMS maintenance is
> > extraordinarily difficult, and storage errors are catastrophic.
> >
> > This is precisely why test_tde exists as a reference implementation. But
> > you've identified the real issue: we need much stronger test coverage
> > for the hooks themselves.
> >
> > The test cases should:
> > - Detect when core changes break hook contracts
> > - Verify hook behavior under all I/O paths (sync, async, error cases)
> > - Validate critical section safety
> > - Test interaction with checksums, crash recovery, replication
> >
> > I agree the current test coverage is insufficient for core inclusion.
> > Would expanding the test suite to cover these scenarios address your
> > maintenance concerns, or do you see fundamental fragility beyond what
> > testing can solve?
> >
>
> I wasn't talking about test coverage. My point is we'd have to keep this
> working forever, even if we choose to change how the SMGR works. Which
> is not entirely theoretical.
>
>
I understand now. The maintenance burden isn't about testing - it's
about constraining future architectural evolution. Once hooks are in
core, they become an API contract that limits PostgreSQL's ability to
refactor SMGR.

This is exactly why SMGR extensibility is the right approach - it makes
the extension points explicit and architectural, rather than scattering
hooks that lock in implementation details.

> >
> > 4. Hooks vs Transform Layer - Pragmatic Timeline
> >
> > You suggested improving SMGR extensibility rather than adding hooks. I
> > think you're architecturally right about the long-term direction.
> >
> > However, I want to be pragmatic about timelines:
> >
> > The hook and pd_flags approach, despite its limitations, can deliver
> > working TDE in the shortest time. Organizations facing regulatory
> > deadlines need something that works now, not in 2-3 years.
> >
>
> Others may see it differently, but my opinion is using pd_flags is a
> dead end.
>
> I realize users may wish for a solution "soon", but we're not going to
> accept a flawed approach because of that. Exchanging short-term benefit
> for long-term pain does not seem like a good trade off.
>
>
Agreed. Though companies are already using SMGR patches in production,
which works while we develop the proper upstream solution.
I'll study these approaches.

>
> > That said, your feedback has sparked a better idea: what if we think of
> > this not as "SMGR extension" or "hooks" but as a pluggable Transform
> > Layer that SMGR and WAL subsystems delegate to?
> >
> > Conceptually:
> >
> > Application Layer
> > |
> > Buffer Manager
> > |
> > +------------------+
> > | Transform Layer | <-- Encryption, etc.
> > +------------------+
> > |
> > SMGR / WAL
> > |
> > File I/O
> >
> > This is architecturally cleaner than scattered hooks, and more focused
> > than full SMGR extensibility. The Transform Layer would:
> >
> > - Provide a unified interface for data transformation
> > - Work across backend, frontend tools, and replication
> > - Handle metadata management in a standardized way
> > - Support encryption, compression, or other transformations
> >
> > I think this deserves its own discussion thread rather than conflating
> > it with the current hook proposal. Would you be interested in starting a
> > separate conversation about designing a Transform Layer interface for
> > PostgreSQL?
> >
>
> Maybe. But I'm not convinced it'd be great to have many parallel thread
> discussing approaches for the same ultimate end goal.
>
>
Understood about avoiding thread fragmentation.

I do wonder where bootstrap and frontend tool encryption should be
discussed - whether that belongs in the 3986 discussion or elsewhere -
but I should study that patch thoroughly first before raising the
question.

> > In the meantime, the hook approach could serve organizations with
> > immediate needs, and extensions could migrate to the Transform Layer
> > once it's stabilized.
> >
>
> It's not like there are no alternatives, though. We have FDE/LUKS,
> application-level encryption, etc. Now there's also pg_tde.
>
> FWIW the hypothetical migration would be far from trivial.
>
> >
> > 5. Frontend Tool Access
> >
> > Both SMGR and hook approaches face a shared limitation: frontend tools
> > (pg_checksums, pg_basebackup, etc.) that read files directly.
> >
>
> I'm not a TDE expert, but I don't see why would tools like pg_basebackup
> need to be aware of this at all. A basebackup is just a filesystem copy.
>
>
You're right - pg_basebackup itself just copies files. The issue I
mentioned was actually specific to our implementation (key storage
under PGDATA with symlinks), not a general TDE concern.

However, tools like pg_checksums that directly read buffer pages,
or tools that read WAL pages, do present a broader question: SMGR
extensibility handles backend I/O, but these frontend tools operate
outside that architecture.

This makes me wonder if a more comprehensive layer might be needed
to cover both backend (SMGR) and frontend tools. But I should study
the existing SMGR work first to see how this is currently addressed.

> I previously suggested allowing initdb to specify a shared library that
> > both backend and frontend can load for transformation. But as I
> > reconsider this, it feels like it converges toward the Transform Layer
> > idea: a well-defined interface that any PostgreSQL component can use.
> >
> > This might be the real architectural question: not "hooks vs SMGR" but
> > "how should PostgreSQL provide transformation points that work across
> > backend, frontend, and replication boundaries?"
> >
>
> Maybe. I was not proposing a new "transformation" layer, though. My
> suggestion was entirely within the current SMGR architecture.
>
> > Maybe. I was not proposing a new "transformation" layer, though. My
> suggestion was entirely within the current SMGR architecture.

Understood.

Though I wonder if WAL encryption should be part of the same
discussion, or separate. SMGR handles pages, but WAL has different
characteristics.

Should this be in patch 3986, or separate?

>
> regards
>
>
> --
> Tomas Vondra
>
>

In response to

Re: RFC: PostgreSQL Storage I/O Transformation Hooks at 2025-12-30 01:19:15 from Tomas Vondra

Responses

Re: RFC: PostgreSQL Storage I/O Transformation Hooks at 2025-12-30 08:01:26 from Zsolt Parragi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Xuneng Zhou	2025-12-30 02:42:09	Re: Implement waiting for wal lsn replay: reloaded
Previous Message	Lukas Fittl	2025-12-30 02:34:18	Re: Refactor query normalization into core query jumbling