Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Antonin Houska <ah(at)cybertec(dot)at>, Sehrope Sarkuni <sehrope(at)jackdb(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-08-26 06:53:57
Message-ID: CAD21AoAoBRO3E9RFqkmz7ieoBoUR0az9UG16mXotcSC28juUAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 23, 2019 at 11:35 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
> Greetings,
>
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > On Fri, Aug 23, 2019 at 07:45:22AM -0400, Stephen Frost wrote:
> > > Having listed out the feature set of each of the other major databases
> > > when it comes to TDE is exactly how we objectively look at what is being
> > > done in the industry, and that then gives us an understanding of what
> > > users (and auditors) coming from other platforms will expect.
> > >
> > > I entirely agree that we shouldn't just copy N feature from X other
> > > database system unless we feel that's the best approach, but when every
> > > other database system out there has capability Y for the general feature
> > > X that we're thinking about implementing, we should be questioning an
> > > approach which doesn't include that.
> >
> > Agreed. The features of other databases are a clear source for what we
> > should consider and run through the useful/reasonable filter.
>
> Following on from that- when other databases don't have something that
> we're thinking about implementing, maybe we should be contemplating if
> it really makes sense as a requirement for us.
>
> Specifically in this case- I went back and tried to figure out what
> other database systems have an "encrypt EVERYTHING" option. I didn't
> have much luck finding one though. So I think we need to ask ourselves-
> the "check box" that we're trying to check off with TDE, do the other
> database system check that box? If so, then it looks like the "check
> box" isn't actually "encrypt EVERYTHING", it's more along the lines of
> "make sure all regular user data is encrypted automatically" or some
> such, and that's a very different requirement, which seems to be
> answered by the other systems by having a KMS + tablespace/database
> level encryption. We certainly shouldn't be putting a lot of effort
> into building something that is either overkill or won't be interesting
> to users due to limitations like "have to take the entire cluster
> offline to re-key it".
>
> Now, that KMS has to be encrypted using a master key, of course, and we
> have to make sure that it is able to survive across a crash, and it'd
> sure be nice if it was indexed. One option for such a KMS would be
> something entirely external (which could potentially just be another PG
> database or something) but it'd be nice if we had something built-in.
> We might also want it to be replicated (or maybe we don't, as was
> discussed on the call, to allow for a replica to use an independent set
> of keys- of course that leads to issues with pg_rewind and such though).

I think most user would expect the physical standby server uses the
same key as the primary server's one, at least for the master key.
Otherwise they would need to use different keys every time of fail
over. Even for WAL encryption keys, since it's common to fetch
archived WAL files that are produced by the primary server by
restore_command using scp the standby server needs to use the same
keys or at least know it. In logical replication, I think that since
we would sent unencrypted data and encrypt it on the subscriber that
is initiated as a different database cluster we can use the different
keys on both sides.

> Anything built-in does seem like it'd be a fair bit of work to get it to
> address those requirements, but that does seem to be what the other
> database systems have done. Unfortunately, their documentation doesn't
> seem to really say exactly what they've done to address that.

I guess that this depends on the number of encryption keys we use. If
we have encryption keys per tablespace or database the number of keys
would be at most several dozen or several hundred. It's enough to have
them in flat-file format on the disk and to load them to the hash
table on the shared memory. We would not need a complex mechanism.
OTOH if we have keys per tables, we would need to consider indexes and
buffering as they might not fit in the memory.

> A couple random ideas that probably won't work, but I'll put them out
> there for others to shoot down-
>
> Some kind of 2-phase WAL pass, where we do WAL replay for the
> non-encrypted bits first (which would include the KMS) and then go back
> and WAL replay the encrypted stuff. Seems terrible.
>
> An independent WAL for the KMS only. Ugh, do we need another walwriter
> then? and buffers, and lots of other stuff.
>
> Some kind of flat-file based approach with a temp file and renaming of
> files using durable_rename(), like what we used to do with
> pg_shadow/authid, and now do with replorigin_checkpoint and such?

The PoC patch I created does that for the keyring file. When key
rotation, the correspond WAL contains all re-encrypted keys with the
master key identifier, and the recovery restores the keyring file. One
good point of this approach is that external tools and startup process
read it easier. It doesn't require backend codes such as system cache
and heap functions.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-08-26 06:56:21 Re: [HACKERS] [PATCH] pageinspect function to decode infomasks
Previous Message Moon, Insung 2019-08-26 05:47:25 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)