Re: RFC: PostgreSQL Storage I/O Transformation Hooks

From: Zsolt Parragi <zsolt(dot)parragi(at)percona(dot)com>
To: assam258(at)gmail(dot)com
Cc: Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: RFC: PostgreSQL Storage I/O Transformation Hooks
Date: 2025-12-28 17:55:34
Message-ID: CAN4CZFPALfUFWav-QxhhgkM4hnfBmh9dmnJ2uMsxGhaBa7LDRg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> - mdread_post_hook: inside the segment loop → Decorator NOT possible

> The mdreadv() function, introduced in PostgreSQL 17 as part of the
> vectored I/O API, processes multiple blocks in a loop that respects
> segment boundaries. The decryption hook must be called inside this loop,
> after each segment's FileReadV() completes. A decorator wrapping mdreadv()
> from the outside cannot access this internal loop timing.

It is possible - or rather, we plan to propose a different patch for
that. There are already some discussions about extendibility of AIO,
which is currently quite minimal, and this is another point for that.
If you look into the AIO sources, it already uses an array of
callbacks, and there's only a small missing piece there - making it
possible for extensions to add entries to that array. With that patch,
it is possible to decorate smgr_startreadv, add your own callback, and
then call the original mdstartreadv function. Since aio callbacks are
executed in the opposite order, this will work out exactly as needed,
as the AIO handler will first call the md completion handler, then
yours.

My logic here is similar to the previous argument: this AIO
extensibility for startreadv is also needed for other uses of the smgr
extension, most likely for everyone who uses the current patch. It
shouldn't be specific to encryption.

> With the SMGR decorator approach, the extension developer must:
> - Track upstream md.c changes
> - Replicate the internal loop logic to find the right decryption point

> With hooks, the extension developer only needs to:
> - Implement encrypt() and decrypt()

> We need a simple, stable hook interface that allows local security
> experts to integrate these required algorithms - experts who understand
> cryptography but not PostgreSQL storage internals.

Extension developers still have to understand the multiprocess nature
of postgres (with AIO you also have to remember that it is possible
for the completion to happen in a different process, possibly in a
worker process), or its unusual memory management patterns, critical
sections, and so on. You most likely also have to deal with shared
memory caches, locks, and so on.

(And as I said above, you don't have to replicate/track md.c, we only
need a good, generic extension point usable for many extensions)

> In South Korea, government
> regulations require the use of nationally-approved cryptographic
> algorithms (such as ARIA, SEED). This means organizations often cannot
> adopt foreign TDE solutions, regardless of their technical merit.

Have you considered contributing to existing solutions? Adding support
to multiple algorithms to an existing library is easier than
developing your own from scratch.

> WAL and heap pages are simply different representations of the same
> underlying data. Protecting only one side would be cryptographically
> incomplete; an attacker could bypass encryption by reading the
> unprotected side. Therefore, they must be treated as a single atomic
> unit of protection.

From a security point of view, I agree. From a practical one, it's a
bit more complicated. As you mentioned South Korean regulations, we
also have regulations in the European Union, and you can conform to
the current regulations by only encrypting your data files (at least
that's what I heard, I'm not a lawyer).

So from a practical point of view, for us, even getting support for
table encryption hooks into the core would be a success.

> My primary concern with using fork files for encryption metadata is crash
> recovery. If a fork file and the actual data page become inconsistent
> (e.g., during a crash), recovery becomes problematic because fork files
> are not typically protected by WAL.

Custom WAL records about encryption events (key rotation/change/etc)
should solve this problem?

> I plan to propose a separate RFC for this
> "gradual rotation" mechanism.

Would this gradual rotation mechanism be useful for anything else
other than encryption extensions? While I also had the same idea, I
don't see how it would be useful for anything else, so I didn't plan
to submit any patches related to this. This is something that can be
easily implemented as a background worker in a tde extension, and
doesn't really require core support.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2025-12-28 18:08:12 Re: index prefetching
Previous Message Konstantin Knizhnik 2025-12-28 17:17:40 Re: RFC: PostgreSQL Storage I/O Transformation Hooks