Re: Moving forward with TDE [PATCH v3]

From: Andres Freund <andres(at)anarazel(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: David Christensen <david(dot)christensen(at)crunchydata(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Moving forward with TDE [PATCH v3]
Date: 2023-11-08 00:47:12
Message-ID: 20231108004712.hhnrg7wad2pnile2@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-11-06 11:26:44 +0100, Matthias van de Meent wrote:
> On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > > I'm quite surprised at the significant number of changes being made
> > > outside the core storage manager files. I thought that changing out
> > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > > would be the most obvious change to implement cluster-wide encryption
> > > with the least code touched, as relations don't need to know whether
> > > the files they're writing are encrypted, right? Is there a reason to
> > > not implement this at the smgr level that I overlooked in the
> > > documentation of these patches?
> >
> > You can't really implement encryption transparently inside an smgr without
> > significant downsides. You need a way to store an initialization vector
> > associated with the page (or you can store that elsewhere, but then you've
> > doubled the worst cse amount of random reads/writes). The patch uses the LSN
> > as the IV (which I doubt is a good idea). For authenticated encryption further
> > additional storage space is required.
>
> I am unaware of any user of the smgr API that doesn't also use the
> buffer cache, and thus implicitly the Page layout with PageHeader
> [^1]

Everything indeed uses a PageHeader - but there are a number of places that do
*not* utilize pd_lower/upper/special. E.g. visibilitymap.c just assumes that
those fields are zero - and changing that wouldn't be trivial / free, because
we do a lot of bitmasking/shifting with constants derived from

#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))

which obviously wouldn't be constant anymore if you could reserve space on the
page.

> The API of smgr is also tailored to page-sized quanta of data
> with mostly relation-level information. I don't see why there would be
> a veil covering the layout of Page for smgr when all other information
> already points to the use of PageHeader and Page layouts. In my view,
> it would even make sense to allow the smgr to get exclusive access to
> some part of the page in the current Page layout.
>
> Yes, I agree that there will be an impact on usable page size if you
> want authenticated encryption, and that AMs will indeed need to
> account for storage space now being used by the smgr - inconvenient,
> but it serves a purpose. That would happen regardless of whether smgr
> or some higher system decides where to store the data for encryption -
> as long as it is on the page, the AM effectively can't use those
> bytes.
> But I'd say that's best solved by making the Page documentation and
> PageInit API explicit about the potential use of that space by the
> chosen storage method (encrypted, plain, ...) instead of requiring the
> various AMs to manually consider encryption when using Postgres' APIs
> for writing data to disk without hitting shared buffers; page space
> management is already a task of AMs, but handling the actual
> encryption is not.

I don't particularly disagree with any detail here - but to me reserving space
for nonces etc at PageInit() time pretty much is the opposite of handling
encryption inside smgr.

> Should the AM really care whether the data on disk is encrypted or
> not? I don't think so. When the disk contains encrypted bytes, but
> smgrread() and smgrwrite() both produce and accept plaintext data,
> who's going to complain? Requiring AMs to be mindful about encryption
> on all common paths only adds pitfalls where encryption would be
> forgotten by the developer of AMs in one path or another.

I agree with that - I think the way the patch currently is designed is not
right.

There's other stuff you can't trivially do at the smgr level. E.g. if
checksums or encryption is enabled, you need to copy the buffer to compute
checksums / do IO if in shared buffers, because somebody could set a hint bit
even with just a shared content lock. But you don't need that when coming from
private buffers during index builds.

> I think that getting PageInit to allocate the smgr-specific area would
> take some effort, too (which would potentially require adding some
> relational context to PageInit, so that it knows which page of which
> relation it is going to initialize), but IMHO that would be more
> natural than requiring all index and table AMs to be aware the actual
> encryption of its pages and require manual handling of that encryption
> when the page needs to be written to disk, when it otherwise already
> conforms to the various buffer management and file extension APIs
> currently in use in PostgreSQL. I would expect "transparent" data
> encryption to be handled at the file write layer (i.e. smgr), not
> inside the AMs.

As mentioned above - I agree that the relevant code shouldn't be in index
AMs. But I somewhat doubt that smgr is the right level either. For one, the
place computing checksums needs awareness of locking / sharing semantics as
well as knowledge about WAL logging. That's IMO above smgr. For another, if we
ever got another smgr implementation - should it have to reimplement
encryption?

ISTM that there's a layer missing. Places bypassing bufmgr.c currently need
their own handling of checksums (and in the future encryption), because the
relevant bufmgr.c code can't be reached without pages in the buffer
pool. Which is why we have PageIsVerifiedExtended() and
PageSetChecksumInplace() calls in gist, hash, heapam, nbtree ... IMO when
checksums were added, we should have added the proper abstraction layer
instead of littering the code with redundant copies.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-11-08 00:52:16 Re: Show WAL write and fsync stats in pg_stat_io
Previous Message Michael Paquier 2023-11-08 00:35:41 Re: Call pqPipelineFlush from PQsendFlushRequest