Quick Links

Re: Moving forward with TDE [PATCH v3]

From:	David Christensen <david(dot)christensen(at)crunchydata(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject:	Re: Moving forward with TDE [PATCH v3]
Date:	2023-11-08 22:47:06
Message-ID:	CAOxo6X+RnY4zh=2M5V5NnurG1B8Xin-T838BOk7jpbS_bYtgLg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Nov 7, 2023 at 6:47 PM Andres Freund <andres(at)anarazel(dot)de> wrote:

> Hi,
>
> On 2023-11-06 11:26:44 +0100, Matthias van de Meent wrote:
> > On Sat, 4 Nov 2023 at 03:38, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > > > I'm quite surprised at the significant number of changes being made
> > > > outside the core storage manager files. I thought that changing out
> > > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > > > would be the most obvious change to implement cluster-wide encryption
> > > > with the least code touched, as relations don't need to know whether
> > > > the files they're writing are encrypted, right? Is there a reason to
> > > > not implement this at the smgr level that I overlooked in the
> > > > documentation of these patches?
> > >
> > > You can't really implement encryption transparently inside an smgr
> without
> > > significant downsides. You need a way to store an initialization vector
> > > associated with the page (or you can store that elsewhere, but then
> you've
> > > doubled the worst cse amount of random reads/writes). The patch uses
> the LSN
> > > as the IV (which I doubt is a good idea). For authenticated encryption
> further
> > > additional storage space is required.
> >
> > I am unaware of any user of the smgr API that doesn't also use the
> > buffer cache, and thus implicitly the Page layout with PageHeader
> > [^1]
>
> Everything indeed uses a PageHeader - but there are a number of places
> that do
> *not* utilize pd_lower/upper/special. E.g. visibilitymap.c just assumes
> that
> those fields are zero - and changing that wouldn't be trivial / free,
> because
> we do a lot of bitmasking/shifting with constants derived from
>
> #define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))
>
> which obviously wouldn't be constant anymore if you could reserve space on
> the
> page.
>

While not constants, I was able to get this working with variable values
here in a way that did not have the overhead of the original patch for the
vismap specifically using Montgomery Multiplication for division/mod. This
was actually the heaviest of the changes from moving to runtime-calculated,
so we might be able to use this approach in this specific case even if only
this change is required for this specific fork.

> > The API of smgr is also tailored to page-sized quanta of data
> > with mostly relation-level information. I don't see why there would be
> > a veil covering the layout of Page for smgr when all other information
> > already points to the use of PageHeader and Page layouts. In my view,
> > it would even make sense to allow the smgr to get exclusive access to
> > some part of the page in the current Page layout.
> >
> > Yes, I agree that there will be an impact on usable page size if you
> > want authenticated encryption, and that AMs will indeed need to
> > account for storage space now being used by the smgr - inconvenient,
> > but it serves a purpose. That would happen regardless of whether smgr
> > or some higher system decides where to store the data for encryption -
> > as long as it is on the page, the AM effectively can't use those
> > bytes.
> > But I'd say that's best solved by making the Page documentation and
> > PageInit API explicit about the potential use of that space by the
> > chosen storage method (encrypted, plain, ...) instead of requiring the
> > various AMs to manually consider encryption when using Postgres' APIs
> > for writing data to disk without hitting shared buffers; page space
> > management is already a task of AMs, but handling the actual
> > encryption is not.
>
> I don't particularly disagree with any detail here - but to me reserving
> space
> for nonces etc at PageInit() time pretty much is the opposite of handling
> encryption inside smgr.
>

Originally, I was anticipating that we might want different space amounts
reserved on different classes of pages (apart from encryption), so while
we'd be storing the default page reserved size in pg_control we'd not be
limited to this in the structure of the page calls. We could presumably
just move the logic into PageInit() itself if every reserved allocation is
the same and individual call sites wouldn't need to know about it. The
call sites do have more context as to the requirements of the page or the
"type" of page in play, which if we made it dependent on page type would
need to get passed in somehow, which was where the reserved_page_size
parameter came in to the current patch.

>
> > Should the AM really care whether the data on disk is encrypted or
> > not? I don't think so. When the disk contains encrypted bytes, but
> > smgrread() and smgrwrite() both produce and accept plaintext data,
> > who's going to complain? Requiring AMs to be mindful about encryption
> > on all common paths only adds pitfalls where encryption would be
> > forgotten by the developer of AMs in one path or another.
>
> I agree with that - I think the way the patch currently is designed is not
> right.
>

The things that need to care tend to be the same places that need to care
about setting checksums, that or being aware of the LSNs in play or needing
to be set. I'd agree that a common interface for "get this page ready for
writing to storage" and "get this page converted from storage" which could
handle both checksums, encryption, or additional page features would make
sense. (I doubt we'd want hooks to support page in/page out, but if we
/did/ want that, this'd also likely live there.)

There's other stuff you can't trivially do at the smgr level. E.g. if
> checksums or encryption is enabled, you need to copy the buffer to compute
> checksums / do IO if in shared buffers, because somebody could set a hint
> bit
> even with just a shared content lock. But you don't need that when coming
> from
> private buffers during index builds.
>
>
>
> > I think that getting PageInit to allocate the smgr-specific area would
> > take some effort, too (which would potentially require adding some
> > relational context to PageInit, so that it knows which page of which
> > relation it is going to initialize), but IMHO that would be more
> > natural than requiring all index and table AMs to be aware the actual
> > encryption of its pages and require manual handling of that encryption
> > when the page needs to be written to disk, when it otherwise already
> > conforms to the various buffer management and file extension APIs
> > currently in use in PostgreSQL. I would expect "transparent" data
> > encryption to be handled at the file write layer (i.e. smgr), not
> > inside the AMs.
>
> As mentioned above - I agree that the relevant code shouldn't be in index
> AMs. But I somewhat doubt that smgr is the right level either. For one, the
> place computing checksums needs awareness of locking / sharing semantics as
> well as knowledge about WAL logging. That's IMO above smgr. For another,
> if we
> ever got another smgr implementation - should it have to reimplement
> encryption?
>
> ISTM that there's a layer missing. Places bypassing bufmgr.c currently need
> their own handling of checksums (and in the future encryption), because the
> relevant bufmgr.c code can't be reached without pages in the buffer
> pool. Which is why we have PageIsVerifiedExtended() and
> PageSetChecksumInplace() calls in gist, hash, heapam, nbtree ... IMO when
> checksums were added, we should have added the proper abstraction layer
> instead of littering the code with redundant copies.

Agreed that a fixup patch to add /something/ here would be good. A concern
here is what context would need to be passed in; certainly with only
checksums just a Page is sufficient, but if we have AAD we'd need to be
able to pass that in or otherwise be able to identify it in the page. As a
point of references, the existing GCM patch authenticates all unencrypted
header fields (i.e., PageHeaderData up to the pd_special field), plus the
RelFileNumber and BlockNumber, of which we'd need to pass in RelFileNumber
and BlockNumber (and presumably would want the ForkNum as well if expanding
the set of authenticated data). To some extent, we can punt on some of
this, as the existing call sites have been modified in this patch to pass
that info in already, so it's really about how we marshal that data.

Thanks,

David

In response to

Re: Moving forward with TDE [PATCH v3] at 2023-11-08 00:47:12 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Smith	2023-11-08 22:53:55	Re: GUC names in messages
Previous Message	Tom Lane	2023-11-08 22:44:55	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500