Re: RFC: PostgreSQL Storage I/O Transformation Hooks

From: Henson Choi <assam258(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Date: 2025-12-28 14:19:02
Message-ID: CAAAe_zD1AmebOUUFUi4TsPjf=sWA9CMiNH8eHfkmQE_nxEJDTQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here is v3 of the Storage I/O Transform Hooks patch.

Changes from v2:
- Fix -Wincompatible-pointer-types error in bufmgr.c by casting
&bufdata to (void **) for mdread_post_hook call

v2 changes were:
- Add meson.build test configuration for test_tde extension

--
Best regards,
Sungkyun Park

2025년 12월 28일 (일) PM 7:44, Henson Choi <assam258(at)gmail(dot)com>님이 작성:

> Updated patches with meson build support:
>
> v2:
> - Added meson.build for test_tde extension
> - Added test_tde to contrib/meson.build
>
> Regards,
> Henson Choi
>
> 2025년 12월 28일 (일) PM 6:47, Henson Choi <assam258(at)gmail(dot)com>님이 작성:
>
>> Hello,
>>
>> Following up on the RFC, I am submitting the initial patch set for the
>> proposed infrastructure. These patches introduce a minimal hook-based
>> protocol to allow extensions to handle data transformation, such as TDE,
>> while keeping the PostgreSQL core independent of specific cryptographic
>> implementations.
>>
>> Implementation Details:
>>
>> Hook Points in Storage I/O Path
>> The patch introduces five strategic hook points:
>>
>> mdread_post_hook: Called after blocks are read from disk. The extension
>> can reverse-transform data in place.
>>
>> mdwrite_pre_hook & mdextend_pre_hook: Called before writing or extending
>> blocks. These hooks return a pointer to transformed buffers.
>>
>> xlog_insert_pre_hook & xlog_decode_pre_hook: Handle transformation for
>> WAL records during insertion and replay.
>>
>> Data Integrity and Checksum Protocol
>> To ensure robust error detection, the hooks follow a specific
>> verification protocol:
>>
>> On Write: The extension transforms the page, sets the Transform ID, then
>> recalculates the checksum on the transformed data.
>>
>> On Read: The extension verifies the on-disk checksum of the transformed
>> data first. After reverse-transformation, it clears the Transform ID and
>> recalculates the checksum for the plaintext data. This ensures corruption
>> is detected regardless of the transformation state.
>>
>> WAL Safety via XLR_BLOCK_ID_TRANSFORMED (251)
>> For WAL records, I have introduced a specific block ID (251) to mark
>> transformed data. If the decryption extension is not loaded, the WAL reader
>> will encounter this unknown block ID and fail-fast, preventing the system
>> from incorrectly interpreting encrypted data as valid WAL records.
>>
>> PageHeader Transform ID (5-bit)
>> I have allocated bits 3-7 of pd_flags in the PageHeader for a Transform
>> ID. This allows the engine and extensions to identify the transformation
>> state of a page (e.g., key versioning or algorithm type) without attempting
>> decryption. It ensures backward compatibility: pages with Transform ID 0
>> are treated as standard untransformed pages.
>>
>> Memory and Critical Section Safety
>> As demonstrated in the contrib/test_tde reference implementation, cipher
>> contexts are pre-allocated in _PG_init to avoid memory allocation during
>> critical sections. For WAL transformation,
>> MemoryContextAllowInCriticalSection() is used to allow buffer reallocation
>> within critical sections; if OOM occurs during buffer growth, it results in
>> a controlled PANIC.
>>
>> Performance Considerations
>> When hooks are not set (default), the overhead is limited to a single
>> NULL pointer comparison per I/O operation. This is architecturally
>> consistent with existing PostgreSQL hooks and is designed to have a
>> negligible impact on performance.
>>
>> Attached Patches:
>>
>> v20251228-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch: Core
>> infrastructure.
>> v20251228-0002-Add-test_tde-extension-for-TDE-testing.patch: Reference
>> implementation using AES-256-CTR.
>>
>> I look forward to your comments and feedback.
>>
>> Regards,
>>
>> Henson Choi
>>
>> 2025년 12월 28일 (일) PM 4:49, Henson Choi <assam258(at)gmail(dot)com>님이 작성:
>>
>>> RFC: PostgreSQL Storage I/O Transformation Hooks Infrastructure for a
>>> Technical Protocol Between RDBMS Core and Data Security Experts
>>>
>>> *Author:* Henson Choi assam258(at)gmail(dot)com
>>>
>>> *Date:* 2025-12-28
>>>
>>> *PostgreSQL Version:* master (Development)
>>> ------------------------------
>>> 1. Summary & Motivation
>>>
>>> This RFC proposes the introduction of minimal hooks into the PostgreSQL
>>> storage layer and the addition of a *Transformation ID* field to the
>>> PageHeader.
>>> A Diplomatic Protocol Between Expert Groups
>>>
>>> The core motivation of this proposal is *“Separation of Concerns and
>>> Mutual Respect.”*
>>>
>>> Historically, discussions around Transparent Data Encryption (TDE) have
>>> often felt like putting security experts on trial in a foreign
>>> court—specifically, the “Court of RDBMS.” It is time to treat them not as
>>> defendants to be judged by database-specific rules, but as an *equal
>>> neighboring community* with their own specialized sovereignty.
>>>
>>> *The issue has never been a failure of technology, but rather a
>>> misplacement of the focal point.* While previous discussions were mired
>>> in the technicalities of “how to hardcode encryption into the core,” this
>>> proposal shifts the debate toward an architectural solution: “what
>>> interface the core should provide to external experts.”
>>>
>>> - *RDBMS Experts* provide a trusted pipeline responsible for data
>>> I/O paths and consistency.
>>> - *Security Experts* take responsibility for the specialized domain
>>> of encryption algorithms and key management.
>>>
>>> This hook system functions as a *Technical Protocol*—a high-level
>>> agreement that allows these two expert groups to exchange data securely
>>> without encroaching on each other’s territory.
>>> ------------------------------
>>> 2. Design Principles
>>>
>>> 1. *Delegation of Authority:* The core remains independent of
>>> specific encryption standards, providing a “free territory” where security
>>> experts can respond to an ever-changing security landscape.
>>> 2. *Diplomatic Convention:* The Transformation ID acts as a
>>> communication protocol between the engine and the extension. The engine
>>> uses this ID to identify the state of the data and hands over control to
>>> the appropriate expert (the extension).
>>> 3. *Minimal Interference:* Overhead is kept near zero when hooks are
>>> not in use, ensuring the native performance of the PostgreSQL engine.
>>>
>>> ------------------------------
>>> 3. Proposal Specifications 3.1 The Interface (Hook Points)
>>>
>>> We allow intervention by security experts through five contact points
>>> along the I/O path:
>>>
>>> - *Read/Write Hooks:* mdread_post, mdwrite_pre, mdextend_pre
>>> (Transformation of the data area)
>>> - *WAL Hooks:* xlog_insert_pre, xlog_decode_pre (Transformation of
>>> transaction logs)
>>>
>>> 3.2 The Protocol Identifier (PageHeader Transformation ID)
>>>
>>> We allocate 5 bits of pd_flags to define the “Security State” of a
>>> page. This serves as a *Status Message* sent by the security expert to
>>> the engine, utilized for key versioning and as a migration marker.
>>> ------------------------------
>>> 4. Reference Implementation: contrib/test_tde A Standard Code of
>>> Conduct for Security Experts
>>>
>>> This reference implementation exists not as a commercial product, but to
>>> define the *Standards of the Diplomatic Protocol* that
>>> encryption/decryption experts must follow when entering the PostgreSQL
>>> domain.
>>>
>>> 1. *Deterministic IV Derivation:* Demonstrates how to achieve
>>> cryptographic safety by trusting unique values provided by the engine
>>> (e.g., LSN).
>>> 2. *Critical Section Safety:* Defines memory management regulations
>>> that security logic must follow within “Critical Sections” to maintain
>>> system stability.
>>> 3. *Hook Chaining:* Demonstrates a cooperative structure that allows
>>> peaceful coexistence with other expert tools (e.g., compression, auditing).
>>>
>>> ------------------------------
>>> 5. Scope
>>>
>>> - *In-Scope:* Backend hook infrastructure, Transformation ID field,
>>> and reference code demonstrating diplomatic protocol compliance.
>>> - *Out-of-Scope:* Specific Key Management Systems (KMS), selection
>>> of specific cryptographic algorithms, and integration with external tools.
>>>
>>> This proposal represents a strategic diplomatic choice: rather than the
>>> PostgreSQL core assuming all security responsibilities, it grants security
>>> experts a *sovereign territory through extensions* where they can
>>> perform at their best.
>>>
>>

Attachment Content-Type Size
v20251228-v3-0001-Add-Storage-I-O-Transform-Hooks-for-PostgreSQL.patch application/octet-stream 14.8 KB
v20251228-v3-0002-Add-test_tde-extension-for-TDE-testing.patch application/octet-stream 51.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Henson Choi 2025-12-28 14:53:54 Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Previous Message Peter Eisentraut 2025-12-28 13:59:47 Re: Get rid of "Section.N.N.N" on DOCs