Re: Key management with tests

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: Key management with tests
Date: 2021-01-18 21:38:47
Message-ID: CA+TgmobUS2Y2OXRuQGs_i1ReJO2fJmLKm=sZ1+aGRqc42oPYPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 18, 2021 at 2:00 PM Tom Kincaid <tomjohnkincaid(at)gmail(dot)com> wrote:
> Some of the missing things you mention above are about the design of
> TDE feature in general. However, this patch is about Key Management
> which is going part of the larger TDE feature. So it feels as though
> there is the need for a general design document about the overall
> vision / approach for TDE and a specific design doc. for Key
> Management. Is it appropriate to include both of those in the same
> patch?

To me, it wouldn't make sense to commit a full README for a TDE
feature that we don't have yet with a key management patch, but the
way that they'll interact with each other has to be clear. The
doc/database-encryption.sgml file that Bruce included in the patch is
a decent start on explaining the design, though I think it needs more
work and more details, perhaps including some of the things Andres
mentioned.

To be honest, after reading over that SGML documentation a bit, I'm
somewhat skeptical about whether it really makes sense to think about
committing the key management part separately. It seems to have no use
independent of the main feature, and it in fact embeds very specific
details of how the main feature is expected to work. For example, the
documentation says that key #0 will be used for data files, and key #1
for WAL. There seems to be no suggestion that the key management
portion of this can be used to manage encryption keys generally for
whatever purposes someone might have; it's all about the needs of a
particular TDE implementation. Typically, we would not commit
something like that separately, or only once the main patch was done,
with the two commits occurring in a relatively short time period.
Otherwise, as Bruce already noted, we can end up with something that
is documented and visible to users but doesn't actually work yet.

Some more specific comments on data-encryption.sgml:

* The documentation explains that the purpose of having a WAL key
separate from the data file key is so that the data file keys can
"eventually" be rotated. It's not clear whether this means that we
might eventually have that feature or that we might eventually be able
to rotate, after failing over. If this kind of thing is possible,
we'll eventually need documentation on how to do it.

* The reasons for use a DEK and a KEK are not explained. I realize
it's not an uncommon practice and that other systems do it, but I
think a few sentences of explanation wouldn't be a bad idea. Even if
we are supposing that hackers who want to have input into this feature
have to be knowledgeable about cryptography, I don't think we can
reasonably suppose that for users.

* "For example" is at one point followed by a period rather than a
colon or comma.

* In the "Internals" subsection, the last sentence doesn't seem to be
grammatical. I wonder if it's missing the word "or"'.

* The part about integrity-checking keys on startup isn't clear. It
makes it sound like we still have a copy of the KEK lying around
someplace against which we can compare, which I assume is not the case
since it would be really insecure.

* I think it's going to be pretty important that we can easily switch
to other cryptographic algorithms as they are discovered, so I don't
like the fact that this is tied specifically to AES. (In fact,
kmgr_utils.h makes it sound like we're specifically locked into
AES256, but that contradicts the documentation, so I guess there's
some clarification needed here about what exactly KMGR_CLUSTER_KEY_LEN
is doing.) As far as possible we should try to make this generic, like
supporting any cipher that SSL has which has property X. It seems
relatively inevitable that every currently popular cryptographic
algorithm will at some point in the future be judged weak and
worthless, just as has already happened with MD5 and some variants of
SHA, both of which used to be considered state of the art. It seems
equally inevitable that new and stronger algorithms will continued to
be devised, and we'll want to adopt those easily.

I'm not sure to what extent this a serious flaw in the patch and to
what extent it's just a matter of tweaking the wording of some things,
but I think this is actually an extremely critical design point where
we had better be certain we've got it right. Few things would be
sadder than to get a TDE feature and then have to rip it out again
because it couldn't be upgraded to work with newer crypto algorithms
with reasonable effort.

Notes on other parts of the documentation:

* The documentation for initdb -K doesn't list the valid values of the
parameter, only the default. Probably we should be specifying an
algorithm here and not just a bit count. Otherwise, like I say above,
what happens when AES gives way to something else? It'd be easy to say
-K BFT256 instead of -K AES256, but if AES is assumed and it's no
longer what we want them we have problems. This kind of thing probably
needs to be cleaned up in a bunch of places.

* I don't see the point of saying "a passphrase or PIN." We don't need
to document that your passphrase might happen to only contain digits.

* pg_alterckey's description of "repair" is hard to understand. It
doesn't really explain why or how this would be necessary, and it begs
the question of why we'd ever leave things in a state that requires
repair. This is sketched out in code comments elsewhere, but I think
at least some of it needs to be explained in the documentation as
well. (Incidentally, I don't think the comments at the top of
recover_failure will survive a visit from pgindent, though I might be
wrong about that.)

* The changes to config.sgml say "Sample script" instead of "Sample scripts".

* I don't think that the documentation of %R is very clear, or
adequate for someone to make effective use of it. If I wanted to use
%R, how would I ensure that a value is available?

* The changes to allfiles.sgml add pg_alterckey.sgml in the wrong
place and include an incorrect whitespace change.

* It's odd that "pg_alterckey" describes itself as "technically"
changing the KEK. Isn't that just what it does, not a technicality? I
imagine we'll ultimately need a way to change a DEK as well, because
otherwise the use of a separate key for the WAL wouldn't accomplish
the intended goal.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2021-01-18 21:42:46 Re: CheckpointLock needed in CreateCheckPoint()?
Previous Message Peter Geoghegan 2021-01-18 21:32:52 Re: Deleting older versions in unique indexes to avoid page splits