Re: RFC: PostgreSQL Storage I/O Transformation Hooks

From: Henson Choi <assam258(at)gmail(dot)com>
To: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Date: 2025-12-28 14:53:54
Message-ID: CAAAe_zCtqo1X1P_XDbffJihcP5_4KH-i+TZtLPd_D8s39Ku6Dw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks

Hi Konstantin,

I have great respect for the work being done on the extensible SMGR API.
It is a powerful tool for use cases that require replacing the entire
storage layer (like Neon's architecture).

However, I believe we should distinguish between Storage Management
(where/how data is stored) and Data Transformation (what the data looks
like). I see a strong case for both approaches to coexist for the
following practical reasons:

1. Separation of Concerns and Safety

Is it reasonable to ask cryptography experts to clone the entire SMGR
implementation and maintain code they don't fully understand just to
insert encryption logic? If an extension developer clones md.c to add
encryption, they become responsible for the fundamental integrity of
PostgreSQL's file I/O. Any bug in their cloned storage logic could lead
to data loss unrelated to encryption itself.

2. The Maintenance Debt of "Cloning"

When md.c receives critical security patches or bug fixes in the core,
every TDE extension maintainer would need to manually backport those
changes to their specific SMGR implementation. This creates a fragmented
ecosystem where security extensions might actually introduce storage
vulnerabilities by running outdated cloned logic.

3. Minimalist Integration

The hook approach allows crypto experts to focus strictly on transform()
and reverse_transform(). The complex storage orchestration remains with
the PostgreSQL core where it is most rigorously tested. This is a cleaner
separation of responsibilities: the core provides the trusted pipeline,
and the extension provides the specialized transformation.

Conclusion:

I believe these hooks provide a "low-barrier, high-safety" path for data
transformation that the SMGR API—by its very nature of being a full
replacement—cannot easily provide. Let's provide the SMGR for those who
want to reinvent the storage, and hooks for those who simply want to
secure the data.

Best regards,
Henson Choi

2025년 12월 28일 (일) PM 9:11, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>님이 작성:

>
> On 28/12/2025 9:49 AM, Henson Choi wrote:
>
> RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
> Technical Protocol Between RDBMS Core and Data Security Experts
>
> *Author:* Henson Choi assam258(at)gmail(dot)com
>
> *Date:* 2025-12-28
>
> *PostgreSQL Version:* master (Development)
> ------------------------------
> 1. Summary & Motivation
>
> This RFC proposes the introduction of minimal hooks into the PostgreSQL
> storage layer and the addition of a *Transformation ID* field to the
> PageHeader.
> A Diplomatic Protocol Between Expert Groups
>
> The core motivation of this proposal is *“Separation of Concerns and
> Mutual Respect.”*
>
> Historically, discussions around Transparent Data Encryption (TDE) have
> often felt like putting security experts on trial in a foreign
> court—specifically, the “Court of RDBMS.” It is time to treat them not as
> defendants to be judged by database-specific rules, but as an *equal
> neighboring community* with their own specialized sovereignty.
>
> *The issue has never been a failure of technology, but rather a
> misplacement of the focal point.* While previous discussions were mired
> in the technicalities of “how to hardcode encryption into the core,” this
> proposal shifts the debate toward an architectural solution: “what
> interface the core should provide to external experts.”
>
> - *RDBMS Experts* provide a trusted pipeline responsible for data I/O
> paths and consistency.
> - *Security Experts* take responsibility for the specialized domain of
> encryption algorithms and key management.
>
> This hook system functions as a *Technical Protocol*—a high-level
> agreement that allows these two expert groups to exchange data securely
> without encroaching on each other’s territory.
> ------------------------------
> 2. Design Principles
>
> 1. *Delegation of Authority:* The core remains independent of specific
> encryption standards, providing a “free territory” where security experts
> can respond to an ever-changing security landscape.
> 2. *Diplomatic Convention:* The Transformation ID acts as a
> communication protocol between the engine and the extension. The engine
> uses this ID to identify the state of the data and hands over control to
> the appropriate expert (the extension).
> 3. *Minimal Interference:* Overhead is kept near zero when hooks are
> not in use, ensuring the native performance of the PostgreSQL engine.
>
> ------------------------------
> 3. Proposal Specifications 3.1 The Interface (Hook Points)
>
> We allow intervention by security experts through five contact points
> along the I/O path:
>
> - *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
> (Transformation of the data area)
> - *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
> transaction logs)
>
> 3.2 The Protocol Identifier (PageHeader Transformation ID)
>
> We allocate 5 bits of pd_flags to define the “Security State” of a page.
> This serves as a *Status Message* sent by the security expert to the
> engine, utilized for key versioning and as a migration marker.
> ------------------------------
> 4. Reference Implementation: contrib/test_tde A Standard Code of Conduct
> for Security Experts
>
> This reference implementation exists not as a commercial product, but to
> define the *Standards of the Diplomatic Protocol* that
> encryption/decryption experts must follow when entering the PostgreSQL
> domain.
>
> 1. *Deterministic IV Derivation:* Demonstrates how to achieve
> cryptographic safety by trusting unique values provided by the engine
> (e.g., LSN).
> 2. *Critical Section Safety:* Defines memory management regulations
> that security logic must follow within “Critical Sections” to maintain
> system stability.
> 3. *Hook Chaining:* Demonstrates a cooperative structure that allows
> peaceful coexistence with other expert tools (e.g., compression, auditing).
>
> ------------------------------
> 5. Scope
>
> - *In-Scope:* Backend hook infrastructure, Transformation ID field,
> and reference code demonstrating diplomatic protocol compliance.
> - *Out-of-Scope:* Specific Key Management Systems (KMS), selection of
> specific cryptographic algorithms, and integration with external tools.
>
> This proposal represents a strategic diplomatic choice: rather than the
> PostgreSQL core assuming all security responsibilities, it grants security
> experts a *sovereign territory through extensions* where they can perform
> at their best.
>
> I wonder if instead of support a lot of extra hooks it will be better to
> provide extensible SMGR API:
>
> https://www.postgresql.org/message-id/flat/CAPP%3DHha_wV1MV9yR70QZ5pk5dtNP%2BbOyBiFxPmrMKqnQeKMAwQ%40mail.gmail.com#ab0da3412525c7501ea17f3d4c602bbf
> It seems to be much more straightforward, convenient and flexible
> mechanism than adding hooks, which can be used for many other purposes
> except transparent encryption.
>
>
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zsolt Parragi 2025-12-28 15:20:52 Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Previous Message Henson Choi 2025-12-28 14:19:02 Re: RFC: PostgreSQL Storage I/O Transformation Hooks