Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-07-08 21:18:11
Message-ID: 20190708211811.sio5o36zxhps7snx@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 8, 2019 at 02:39:44PM -0400, Stephen Frost wrote:
> > > Of course, we can discuss if what websites do with over-the-wire
> > > encryption is sensible to compare to what we want to do in PG for
> > > data-at-rest, but then we shouldn't be talking about what websites do,
> > > it'd make more sense to look at other data-at-rest encryption systems
> > > and consider what they're doing.
> >
> > (I talked to Joe on chat for clarity.) In modern TLS, the certificate is
> > used only for authentication, and Diffie–Hellman is used for key
> > exchange:
> >
> > https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange
>
> Right, and the key that's figured out for each connection is at least
> specific to the server AND client keys/certificates, thus meaning that
> they're changed at least as frequently as those change (and clients end
> up creating ones on the fly randomly if they don't have one, iirc).
>
> > So, the question is whether you can pass so much data in TLS that using
> > the same key for the entire session is a security issue. TLS originally
> > had key renegotiation, but that was removed in TLS 1.3:
> >
> > https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
> > To mitigate these types of attacks, TLS 1.3 disallows renegotiation.
>
> It was removed due to attacks targeting the renegotiation, not because
> doing re-keying by itself was a bad idea, or because using multiple keys
> was seen as a bad idea.

Well, if it was a necessary features, I assume TLS 1.3 would have found
a way to make it secure, no? Certainly they are not shipping TLS 1.3
with a known weakness.

> > Of course, a database is going to process even more data so if the
> > amount of data encrypted is a problem, we might have a problem too in
> > using a single key. This is not related to whether we use one key for
> > the entire cluster or multiple keys per tablespace --- the problem is
> > the same. I guess we could create 1024 keys and use the bottom bits of
> > the block number to decide what key to use. However, that still only
> > pushes the goalposts farther.
>
> All of this is about pushing the goalposts farther away, as I see it.
> There's going to be trade-offs here and there isn't going to be any "one
> right answer" when it comes to this space. That's why I'm inclined to
> argue that we should try to come up with a relatively *good* solution
> that doesn't create a huge amount of work for us, and then build on
> that. To that end, leveraging metadata that we already have outside of
> the catalogs (databases, tablespaces, potentially other information that
> we store, essentially, in the filesystem metadata already) to decide on
> what key to use, and how many we can support, strikes me as a good
> initial target.

Yes, we will need that for a usable nonce that we don't need to store in
the blocks and WAL files.

> > Anyway, I will to research the reasonable data size that can be secured
> > with a single key via AES. I will look at how PGP encrypts large files
> > too.
>
> This seems unlikely to lead to a definitive result, but it would be
> interesting to hear if there have been studies around that and what
> their conclusions were.

I found this:

https://crypto.stackexchange.com/questions/44113/what-is-a-safe-maximum-message-size-limit-when-encrypting-files-to-disk-with-aes
https://crypto.stackexchange.com/questions/20333/encryption-of-big-files-in-java-with-aes-gcm/20340#20340

The numbers listed are:

Maximum Encrypted Plaintext Size: 68GB
Maximum Processed Additional Authenticated Data: 2 x 10^18

The 68GB value is "the maximum bits that can be processed with a single
key/IV(nonce) pair." We would 8k of data for each 8k page. If we
assume a unique nonce per page that is 10^32 bytes.

For the WAL we would probably use a different nonce for each 16MB page,
so we would be OK there too, since that is 10 ^ 36 bytes.

gives us 10^36 bytes before the segment number causes the nonce to
repeat.

> When it comes to concerns about autovacuum or other system processes,
> those don't have any direct user connections or interactions, so having
> them be more privileged and having access to more is reasonable.

Well, I am trying to understand the value of having some keys accessible
by some parts of the system, and some not. I am unclear what security
value that has.

> Ideally, all of this would leverage a vaulting system or other mechanism
> which manages access to the keys and allows their usage to be limited.
> That's been generally accepted as a good way to bridge the gap between
> having to ask users every time for a key and having keys stored
> long-term in memory. Having *only* the keys for the data which the
> currently connected user is allowed to access would certainly be a great
> initial capability, even if system processes (including potentially WAL
> replay) have to have access to all of the keys. And yes, shared buffers
> being unencrypted and accessible by every backend continues to be an
> issue- it'd be great to improve on that situation too. I don't think
> having everything encrypted in shared buffers is likely the solution,
> rather, segregating it up might make more sense, again, along similar
> lines to keys and using metadata that's outside of the catalogs, which
> has been discussed previously, though I don't think anyone's actively
> working on it.

What is this trying to protect against? Without a clear case, I don't
see what that complexity is buying us.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-07-08 21:20:51 Re: Excessive memory usage in multi-statement queries w/ partitioning
Previous Message Alexander Korotkov 2019-07-08 21:03:54 Re: Add missing operator <->(box, point)