Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-07-08 19:30:03
Message-ID: 20190708193003.mt2lrr5rm3dz3ffu@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 08, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
>Greetings,
>
>* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
>> On Mon, Jul 8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
>> > * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
>> > > On Mon, Jul 8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
>> > > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
>> > > > > When people are asking for multiple keys (not just for key rotation),
>> > > > > they are asking to have multiple keys that can be supplied by users only
>> > > > > when they need to access the data. Yes, the keys are always in the
>> > > > > datbase, but the feature request is that they are only unlocked when the
>> > > > > user needs to access the data. Obviously, that will not work for
>> > > > > autovacuum when the encryption is at the block level.
>> > > >
>> > > > > If the key is always unlocked, there is questionable security value of
>> > > > > having multiple keys, beyond key rotation.
>> > > >
>> > > > That is not true. Having multiple keys also allows you to reduce the
>> > > > amount of data encrypted with a single key, which is desirable because:
>> > > >
>> > > > 1. It makes cryptanalysis more difficult
>> > > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
>> > >
>> > > What systems use multiple keys like that? I know of no website that
>> > > does that. Your arguments seem hypothetical. What is your goal here?
>> >
>> > Not sure what the reference to 'website' is here, but one doesn't get
>> > certificates for TLS/SSL usage that aren't time-bounded, and when it
>> > comes to the actual on-the-wire encryption that's used, that's a
>> > symmetric key that's generated on-the-fly for every connection.
>> >
>> > Wouldn't the fact that they generate a different key for every
>> > connection be a pretty clear indication that it's a good idea to use
>> > multiple keys and not use the same key over and over..?
>> >
>> > Of course, we can discuss if what websites do with over-the-wire
>> > encryption is sensible to compare to what we want to do in PG for
>> > data-at-rest, but then we shouldn't be talking about what websites do,
>> > it'd make more sense to look at other data-at-rest encryption systems
>> > and consider what they're doing.
>>
>> (I talked to Joe on chat for clarity.) In modern TLS, the certificate is
>> used only for authentication, and Diffie–Hellman is used for key
>> exchange:
>>
>> https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
>
>Right, and the key that's figured out for each connection is at least
>specific to the server AND client keys/certificates, thus meaning that
>they're changed at least as frequently as those change (and clients end
>up creating ones on the fly randomly if they don't have one, iirc).
>
>> So, the question is whether you can pass so much data in TLS that using
>> the same key for the entire session is a security issue. TLS originally
>> had key renegotiation, but that was removed in TLS 1.3:
>>
>> https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
>> To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
>
>It was removed due to attacks targeting the renegotiation, not because
>doing re-keying by itself was a bad idea, or because using multiple keys
>was seen as a bad idea.
>
>> Of course, a database is going to process even more data so if the
>> amount of data encrypted is a problem, we might have a problem too in
>> using a single key. This is not related to whether we use one key for
>> the entire cluster or multiple keys per tablespace --- the problem is
>> the same. I guess we could create 1024 keys and use the bottom bits of
>> the block number to decide what key to use. However, that still only
>> pushes the goalposts farther.
>
>All of this is about pushing the goalposts farther away, as I see it.
>There's going to be trade-offs here and there isn't going to be any "one
>right answer" when it comes to this space. That's why I'm inclined to
>argue that we should try to come up with a relatively *good* solution
>that doesn't create a huge amount of work for us, and then build on
>that. To that end, leveraging metadata that we already have outside of
>the catalogs (databases, tablespaces, potentially other information that
>we store, essentially, in the filesystem metadata already) to decide on
>what key to use, and how many we can support, strikes me as a good
>initial target.
>
>> Anyway, I will to research the reasonable data size that can be secured
>> with a single key via AES. I will look at how PGP encrypts large files
>> too.
>
>This seems unlikely to lead to a definitive result, but it would be
>interesting to hear if there have been studies around that and what
>their conclusions were.
>
>When it comes to concerns about autovacuum or other system processes,
>those don't have any direct user connections or interactions, so having
>them be more privileged and having access to more is reasonable.
>

I think Bruce's proposal was to minimize the time the key is "unlocked"
in memory by only unlocking them when the user connects and supplies
some sort of secret (passphrase), and remove them from memory when the
user disconnects. So there's no way for the auxiliary processes to gain
access to those keys, because only the user knows the secret.

FWIW I have doubts this scheme actually measurably improves privacy in
practice, because most busy applications will end up having the keys in
the memory all the time anyway.

It also assumes memory is unsafe, i.e. bad actors can read it, and
that's probably a valid concern (root access, vulnerabilities etc.). But
in that case we already have plenty of issues with data in flight
anyway, and I doubt TDE is an answer to that.

>Ideally, all of this would leverage a vaulting system or other mechanism
>which manages access to the keys and allows their usage to be limited.
>That's been generally accepted as a good way to bridge the gap between
>having to ask users every time for a key and having keys stored
>long-term in memory.

Right. I agree with this.

>Having *only* the keys for the data which the
>currently connected user is allowed to access would certainly be a great
>initial capability, even if system processes (including potentially WAL
>replay) have to have access to all of the keys. And yes, shared buffers
>being unencrypted and accessible by every backend continues to be an
>issue- it'd be great to improve on that situation too. I don't think
>having everything encrypted in shared buffers is likely the solution,
>rather, segregating it up might make more sense, again, along similar
>lines to keys and using metadata that's outside of the catalogs, which
>has been discussed previously, though I don't think anyone's actively
>working on it.
>

I very much doubt TDE is a solution to this. Essentially, TDE is a good
data-at-rest solution, but this seems more like protecting data during
execution. And in that case I think we may need an entirely different
encryption scheme.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Konstantin Knizhnik 2019-07-08 19:32:38 Re: [HACKERS] Cached plans and statement generalization
Previous Message Tom Lane 2019-07-08 19:21:41 Re: PGOPTIONS="-fh" make check gets stuck since Postgres 11