| From: | Tomas Vondra <tomas(at)vondra(dot)me> |
|---|---|
| To: | assam258(at)gmail(dot)com |
| Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com> |
| Subject: | Re: RFC: PostgreSQL Storage I/O Transformation Hooks |
| Date: | 2025-12-30 01:19:15 |
| Message-ID: | e3214639-36b8-42ec-ac69-cb4379962fbc@vondra.me |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Please don't top-post. We generally prefer to reply in-line, which makes
it easier to follow the discussion. With top-posting I have to seek what
are you responding to.
On 12/29/25 03:35, Henson Choi wrote:
> Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
>
> Hi Tomas,
>
> Thank you for this critical feedback. Your concerns go to the heart of
> the proposal's viability, and I appreciate your directness.
>
>
> 1. Multiple Extensions and Hook Chaining
>
> You're right to question this. To be honest, I have significant doubts
> about allowing multiple transformation extensions simultaneously.
>
> The Transform ID coordination problem is real: without a registry or
> protocol between extensions, they cannot cooperate safely. Hook chaining
> for read/write operations might work (extension A encrypts, extension B
> compresses), but the Transform ID field creates conflicts.
>
> Perhaps I should be more direct: transformation hook chaining is not
> realistically possible with the current design. TDE extensions would
> need exclusive use of these hooks. This is a fundamental limitation I
> should have stated clearly in the RFC.
>
Isn't that just another argument against using hooks? Chaining is what
hooks do, and there's no protection against a hook being set by multiple
extensions.
>
> 2. pd_flags Reservation - I Hope You'll Consider This
>
> I understand your concern about reserving pd_flags bits for extensions.
> However, I'd like to ask you to consider the reasoning behind this choice.
>
> The 5-bit Transform ID serves a critical purpose: it allows the core to
> identify the page's transformation state without attempting decryption.
> This is important for:
>
> - Error reporting: "This page is encrypted with transform ID 5, but no
> extension is loaded to handle it"
> - Migration safety: Distinguishing between untransformed pages (ID=0)
> and transformed pages during gradual encryption
> - Crash recovery: The core can detect transformation state inconsistencies
>
> That said, I recognize pd_flags is precious and limited. Let me propose
> an alternative approach that might better align with core principles:
>
The information may be crucial, but pd_flags is simply not meant to be
used by extensions to store custom data.
> Instead of extension-specific Transform IDs, what if we allow extensions
> to reserve space at pd_upper (similar to how special space works at
> pd_special)?
>
> The core could manage a small flag (2-3 bits) indicating "N bytes at
> pd_upper are reserved for transformation metadata". By encoding N as
> multiples of 2 or 4 bytes, we maximize the flag's efficiency:
>
> - 2 bits encoding 4-byte multiples: 0-12 bytes (sufficient for most cases)
> - 3 bits encoding 4-byte multiples: 0-28 bytes (covers all reasonable needs)
> - 3 bits encoding 2-byte multiples: 0-14 bytes (finer granularity)
>
> This approach uses minimal pd_flags bits while providing substantial
> metadata space. It would:
>
> - Keep the flag in core control (not extension-specific)
> - Allow extensions to store IV, authentication tags, key version, etc.
> in a standardized location
> - Be self-describing (the flag tells you how much space is reserved)
> - Generalize beyond encryption (compression, checksums, etc. could use it)
>
> In our internal implementation, we actually add opaque bytes to
> PageHeader for encryption metadata. This pd_upper approach could
> formalize that pattern for extensions.
>
> I believe some form of page-level metadata for transformations is
> necessary. Would either approach (Transform ID or pd_upper reservation)
> be acceptable with the right design, or do you see fundamental issues
> with page-level transformation metadata itself?
>
AFAICS this is pretty much exactly what this patch aimed to do (also to
allow implementing TDE):
https://commitfest.postgresql.org/patch/3986/
Clearly, it's not as simple as it may seem, otherwise the patch would
not be WIP for 3 years.
>
> 3. Maintenance Burden and Test Coverage
>
> I deeply appreciate this concern. Having worked across various DBMS
> implementations, I've seen solution vendors ship without comprehensive
> regression testing - but never a database vendor. DBMS maintenance is
> extraordinarily difficult, and storage errors are catastrophic.
>
> This is precisely why test_tde exists as a reference implementation. But
> you've identified the real issue: we need much stronger test coverage
> for the hooks themselves.
>
> The test cases should:
> - Detect when core changes break hook contracts
> - Verify hook behavior under all I/O paths (sync, async, error cases)
> - Validate critical section safety
> - Test interaction with checksums, crash recovery, replication
>
> I agree the current test coverage is insufficient for core inclusion.
> Would expanding the test suite to cover these scenarios address your
> maintenance concerns, or do you see fundamental fragility beyond what
> testing can solve?
>
I wasn't talking about test coverage. My point is we'd have to keep this
working forever, even if we choose to change how the SMGR works. Which
is not entirely theoretical.
>
> 4. Hooks vs Transform Layer - Pragmatic Timeline
>
> You suggested improving SMGR extensibility rather than adding hooks. I
> think you're architecturally right about the long-term direction.
>
> However, I want to be pragmatic about timelines:
>
> The hook and pd_flags approach, despite its limitations, can deliver
> working TDE in the shortest time. Organizations facing regulatory
> deadlines need something that works now, not in 2-3 years.
>
Others may see it differently, but my opinion is using pd_flags is a
dead end.
I realize users may wish for a solution "soon", but we're not going to
accept a flawed approach because of that. Exchanging short-term benefit
for long-term pain does not seem like a good trade off.
> That said, your feedback has sparked a better idea: what if we think of
> this not as "SMGR extension" or "hooks" but as a pluggable Transform
> Layer that SMGR and WAL subsystems delegate to?
>
> Conceptually:
>
> Application Layer
> |
> Buffer Manager
> |
> +------------------+
> | Transform Layer | <-- Encryption, etc.
> +------------------+
> |
> SMGR / WAL
> |
> File I/O
>
> This is architecturally cleaner than scattered hooks, and more focused
> than full SMGR extensibility. The Transform Layer would:
>
> - Provide a unified interface for data transformation
> - Work across backend, frontend tools, and replication
> - Handle metadata management in a standardized way
> - Support encryption, compression, or other transformations
>
> I think this deserves its own discussion thread rather than conflating
> it with the current hook proposal. Would you be interested in starting a
> separate conversation about designing a Transform Layer interface for
> PostgreSQL?
>
Maybe. But I'm not convinced it'd be great to have many parallel thread
discussing approaches for the same ultimate end goal.
> In the meantime, the hook approach could serve organizations with
> immediate needs, and extensions could migrate to the Transform Layer
> once it's stabilized.
>
It's not like there are no alternatives, though. We have FDE/LUKS,
application-level encryption, etc. Now there's also pg_tde.
FWIW the hypothetical migration would be far from trivial.
>
> 5. Frontend Tool Access
>
> Both SMGR and hook approaches face a shared limitation: frontend tools
> (pg_checksums, pg_basebackup, etc.) that read files directly.
>
I'm not a TDE expert, but I don't see why would tools like pg_basebackup
need to be aware of this at all. A basebackup is just a filesystem copy.
> I previously suggested allowing initdb to specify a shared library that
> both backend and frontend can load for transformation. But as I
> reconsider this, it feels like it converges toward the Transform Layer
> idea: a well-defined interface that any PostgreSQL component can use.
>
> This might be the real architectural question: not "hooks vs SMGR" but
> "how should PostgreSQL provide transformation points that work across
> backend, frontend, and replication boundaries?"
>
Maybe. I was not proposing a new "transformation" layer, though. My
suggestion was entirely within the current SMGR architecture.
regards
--
Tomas Vondra
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Henson Choi | 2025-12-30 01:21:26 | [PATCH] Add missing XLogEnsureRecordSpace() call in LogLogicalMessage |
| Previous Message | Lukas Fittl | 2025-12-30 01:15:18 | Re: pg_plan_advice |