Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-05-09 08:49:12
Message-ID: CAD21AoBUaqdRWhf=23b9w5U=kK21MszFq4go+cuysq6TH=J_Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 8, 2019 at 10:32 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Tue, May 7, 2019 at 2:10 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > > That better not be true. If you have a design where reading the WAL
> > > > lets you get *any* encryption key, you have a bad design, I think.
> >
> > How does the startup process decrypt WAL during recovery without
> > getting any encryption key if we encrypt user data in WAL by multiple
> > encryption keys?
>
> The keys have to be supplied from someplace outside of the database
> system. I am imagining a command that gets run with the key ID as an
> argument and is expected to print the key out on standard output for
> the server to read.
>
> I am not an encryption expert, but it's hard for me to imagine this
> working any other way. I mean, if you store the keys that you need
> for decryption inside the database, isn't that the same as storing
> your house key in your house, or your car key in your car? If you
> store your car key in the car, then either the car is locked from the
> outside, and the key is useless to you, or the car is unlocked from
> the outside, and the key is just as available to a thief as it is to
> you. Either way, it provides no security. What you do is keep your
> car key in your pocket or purse; if you try to start the car, it
> "requests" the key from you as proof that you are entitled to start
> it.

Agreed, keys for decryption must be stored outside of database.

> I think the database has to work similarly, except that rather
> than protecting the act of "starting" the database, each key is
> requested the first time it's needed, when it's discovered that we
> need to decrypt some data encrypted with that key.
>

It could depend on the design. In 2-tier key architecture that we
proposed, since all data keys that we need for encryption of table
data are encrypted and stored inside of database, we can get the
master key at once when starting database and decrypt all data keys.

> > > > Well, what threat are you trying to protect against?
> >
> > Data theft bypassing PostgreSQL's ACL, for example a malicious user
> > thefts storage devices and reads datbase files directly.
> >
> > I'm thinking that only users who have an access privilege of the
> > database object can get encryption key for the object. Therefore, when
> > a malicious user stole an encryption key by breaking the access
> > control system if we suppose data at rest encryption to serve as a yet
> > another access control layer we have to use the same encryption key
> > for WAL as that we used for database file. But I thought that we
> > should rather protect data from that situation by access control
> > system and managing encryption keys more robustly.
>
> I don't really follow that logic. If the encryption keys are managed
> robustly enough that they cannot be stolen, then we only need one. If
> there is still enough risk of key theft that we care to protect
> against it, we can't use a dedicated key for the WAL without
> increasing the risk.

In 2-tier key architecture design, the key dedicated for WAL (=WAL
data key) is stored inside of database and it never go out of
database, which is also true for data keys of tables and indexes .
The master key is per database cluster and it encrypts all data key as
well before storing them to the disk. Therefore when the master key is
stolen, a malicious user can see not only all data in WAL but also all
table data, because the all data keys are decrypted with the master
key. So I thought that the situation you're concerned is where a
malicious user can see a table data of that they don't have privilege
if they stole the master key, WAL data key and WAL but not for table
data. Is that right?

>
> > > > > FWIW, binary log encryption of MySQL uses different encryption key
> > > > > from a key used for table[1]. The key is encrypted by the master key
> > > > > for binary log encryption and is stored in each file headers.
> > > >
> > > > So, if you steal the master key for binary log encryption, you can
> > > > decrypt everything, it sounds like.
> >
> > Yes, I think so.
>
> I am not keen to copy that design. It sounds like having multiple
> keys in this design adds a lot of complexity without adding much
> security.
>
> > > > Data other than table and index data seems like it is not very
> > > > security-sensitive. I'm not sure we need to encrypt it at all. If we
> > > > do, using one key seems fine.
> >
> > Agreed. But it seems not to satisfy some user who require to encrypt
> > everything, which we discussed before.
>
> Agreed. I'm thinking possibly we need two different facilities.
> Facility #1 could be whole-database encryption: everything is
> encrypted with one key on a block level. And facility #2 could be
> per-table encryption: blocks for specific tables (and the related
> TOAST tables, indexes, and relation forks) are encrypted with specific
> keys and, in addition, the WAL records for those tables (and the
> related TOAST tables, indexes, and relation forks) are encrypted with
> the same key, but on a per-WAL-record level; the original WAL record
> would get "wrapped" by a new WAL record that just says "I am an
> encrypted WAL record, key ID %d, encrypted contents: %s" and you have
> to get the key to decrypt the contents and decrypt the real WAL record
> inside of it. Then you process that interior record as normal.
>
> I guess if you had both things, you'd want tables for which facility
> #2 was enabled to bypass facility #1, so that no relation data blocks
> were doubly-encrypted, to avoid the overhead. But a WAL record would
> be doubly-encrypted when both facilities are in use: the record would
> get encrypted with the per-table key, and then the blocks it got
> stored into would be encrypted with the cluster-wide key.

#2 also must encrypt system catalogs as well as specified user tables,
and temporary files are also encrypted. So the difference between #1
and #2 is whether to encrypt all data in WAL from the perspective of
encrypted objects? Or do you think that #1 encrypts anything other
objects or files such as large objects and backup_label? If the
difference is only WAL, #2 can cover #1 by encrypting all WAL records.

>
> > I wanted to say that if we encrypt whole database cluster by single
> > encryption key we would need to rebuilt the database cluster when
> > re-encrypt data. But if we encrypt data in tablespaces by per
> > tablespace encryption keys we can re-encrypt data by moving
> > tablespaces, without rebuilt it.
>
> Interesting. I suppose that would also be true of per-table keys.
> CREATE TABLE newthunk ENCRYPT WITH 'hoge' AS SELECT * FROM thunk; or
> something of that sort.
>
> Is there any real advantage of making this per-tablespace rather than
> per-table in PostgreSQL's architecture? In some other systems, all the
> stuff in a tablespace is glommed together into a big file or a raw
> disk partiton or something, so if you used different keys for
> different things in the tablespace then it might be hard to know which
> key to use for which blocks, but we've got separate files for each
> relation anyway. Now, that doesn't answer the question of how
> recovery, which can't do pg_class lookups, knows which key to use for
> which relation, but recovery can't do pg_tablespace lookups either.
> But I think there's a simple answer for that: the encrypted 'wrapper'
> WAL record must say which key should be used to decrypt the WAL record
> inside of it. And that must be the same key ID that should be used
> for the corresponding relation files that the WAL record touches. So
> no problem!

In terms of keys, one advantage could be that we have less keys with
per-tablespace keys.

Let me briefly explain the current design I'm thinking. The design
employees 2-tier key architecture. That is, a database cluster has one
master key and per-tablespace keys which are encrypted with the master
key before storing to disk. Each tablespace keys are generated
randomly inside database when CREATE TABLESPACE. The all encrypted
tablespace keys are stored together with the master key ID to the file
(say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can
easily get all tablespace keys and the master key ID before starting
recovery, and therefore can get the all decrypted tablespace keys. The
reason why it doesn't store per-tablespace keys in a column of
pg_tabelspace is that we also encrypt pg_tablespace with the
tablespace key. We could take a way to not encrypt only pg_tablespace,
however it instead would require to scan pg_tablespace before
recovery, and eventually we would need to not encrypt pg_attribute
that should be encrypted.

During the recovery I'm also thinking the idea you suggested; wrapper
WAL records have tablespace OID that is the lookup key for tablespace
key and the startup process can get the tablespace key.

Given that the above design less data keys is better. Obviously
per-tablespace keys are less than per-table keys. And even if we
employee per-tablespace keys we can allow user to specify per-table
encryption by using the same encryption key within the tablespace.

FYI one advantage of per-tablespace encryption from user perspective
would be less conversion when database migration. Using
default_tablespace parameter we need less modification of create table
DDL.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-05-09 09:12:09 Re: pgsql: Add strict_multi_assignment and too_many_rows plpgsql checks
Previous Message Thomas Munro 2019-05-09 08:43:06 Re: We're leaking predicate locks in HEAD