Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Sehrope Sarkuni <sehrope(at)jackdb(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-08-24 02:04:13
Message-ID: 20190824020413.GS16436@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> On Fri, Aug 23, 2019 at 10:35:17AM -0400, Stephen Frost wrote:
> > Following on from that- when other databases don't have something that
> > we're thinking about implementing, maybe we should be contemplating if
> > it really makes sense as a requirement for us.
>
> Yes, that's a good point.
>
> > Specifically in this case- I went back and tried to figure out what
> > other database systems have an "encrypt EVERYTHING" option. I didn't
> > have much luck finding one though. So I think we need to ask ourselves-
> > the "check box" that we're trying to check off with TDE, do the other
> > database system check that box? If so, then it looks like the "check
> > box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> > "make sure all regular user data is encrypted automatically" or some
> > such, and that's a very different requirement, which seems to be
> > answered by the other systems by having a KMS + tablespace/database
> > level encryption. We certainly shouldn't be putting a lot of effort
> > into building something that is either overkill or won't be interesting
> > to users due to limitations like "have to take the entire cluster
> > offline to re-key it".
>
> Well, I think they might do that to reduce encryption overhead. I think
> tests have shown that is not an issue, but we will need to test further.

I seriously doubt that's why and I don't think there's actually much
value in trying to figure out the "why" here- the question is, do those
systems answer the check-box requirement that was brought up on the call
as the justification for this feature? If so, then clearly not
everything is required to be encrypted and we shouldn't be stressing
over trying to do that.

> I am not sure of the downside of encrypting everything, since it leaks
> the least information and has a minimal user API and code impact. What
> is the value of encrypting only the user rows? Better key control?

Yes, better key control, and better user API, and avoiding having an
implementation that isn't actually what people either expect or want. I
don't agree at all that this distinction has a "minimal user API
impact"- much of the reason we were throwing out the idea of having a
proper KMS for the "bulk data encryption", at least from what I gathered
on the call, is because of the issues around having to try and bootstrap
a fully encrypted system and deal with crash recovery and hypothesized
leaks. If we can accept that it's alright for some data to be
unencrypted, then that certainly makes life easier for us, and from what
it looks like, that's pretty typical in industry. I daresay it seems
likely that could get us all the way to table-level encryption of whole
tuples as discussed elsewhere. I had a further side-chat with Sehrope
where I believe I explained why the concern regarding tids and ordering
isn't actually valid too, would be great if we could discuss that at
some point as well. I'd be happy to chat with you about it first and
then if we agree, write up the discussion for the list as well.

> > Now, that KMS has to be encrypted using a master key, of course, and we
> > have to make sure that it is able to survive across a crash, and it'd
> > sure be nice if it was indexed. One option for such a KMS would be
> > something entirely external (which could potentially just be another PG
> > database or something) but it'd be nice if we had something built-in.
> > We might also want it to be replicated (or maybe we don't, as was
> > discussed on the call, to allow for a replica to use an independent set
> > of keys- of course that leads to issues with pg_rewind and such though).
>
> I think the replica could use a different key for the relations, but the
> WAL key would have to be the same.

This depends on how the WAL is sent to the replica-- if it's sent
unencrypted then the replica could have a different key, at least
potentially. There are some very interesting questions around pg_rewind
support and archive_mode = always, but that's pretty far down the road
and we may have to tell the users that they have to make some choices
about if they want to have support for those features.

> > Anything built-in does seem like it'd be a fair bit of work to get it to
> > address those requirements, but that does seem to be what the other
> > database systems have done. Unfortunately, their documentation doesn't
> > seem to really say exactly what they've done to address that.
>
> I do like they pgcrypto key support to be per-database so pg_dump will
> dump the data encrypted, and with its locked keys.

Yes, a built-in KMS would also need pg_dump support.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-08-24 02:15:34 Re: Does TupleQueueReaderNext() really need to copy its result?
Previous Message Alexander Korotkov 2019-08-24 02:03:38 Re: Comment in ginpostinglist.c doesn't match code