Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-02-07 08:25:49
Message-ID: CAD21AoAqtytk0iH6diCJW24oyJdS4roN-VhrFD53HcNP0s8pzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 7, 2019 at 9:27 AM Moon, Insung
<Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> Dear Ibrar Ahmed.
>
> From: Ibrar Ahmed [mailto:ibrar(dot)ahmad(at)gmail(dot)com]
> Sent: Thursday, February 07, 2019 4:09 AM
> To: Moon, Insung
> Cc: Tom Lane; PostgreSQL-development
> Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
>
>
> On Tue, Jul 3, 2018 at 5:37 PM Moon, Insung <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Dear Tom Lane.
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> > Sent: Monday, June 18, 2018 11:52 PM
> > To: Robert Haas
> > Cc: Joe Conway; Masahiko Sawada; Moon, Insung; PostgreSQL-development
> > Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
> >
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > On Mon, Jun 18, 2018 at 10:12 AM, Joe Conway <mail(at)joeconway(dot)com> wrote:
> > >> Not necessarily. Our pages probably have enough predictable bytes to
> > >> aid cryptanalysis, compared to user data in a column which might not
> > >> be very predicable.
> >
> > > Really? I would guess that the amount of entropy in a page is WAY
> > > higher than in an individual column value.
> >
> > Depending on the specifics of the encryption scheme, having some amount of known (or guessable) plaintext may allow breaking
> > the cipher, even if much of the plaintext is not known. This is cryptology 101, really.
> >
> > At the same time, having to have a bunch of independently-decipherable short field values is not real secure either, especially
> > if they're known to all be encrypted with the same key. But what you know or can guess about the plaintext in such cases
> > would be target-specific, rather than an attack that could be built once and used against any PG database.
>
> > > Yes. If there is known to guessable data of encrypted data, maybe there is a possibility of decrypting the encrypted data.
> > >
> > > But would it be safe to use an additional encryption mode such as GCM or XFS to solve this problem?
> > > (Do not use the same IV)
> > > Thank you and Best regards.
> > > Moon.
> >
> > >
> > > regards, tom lane
>
>
>
>
>
> > Hi Moon,
> >
> > Have you done progress on that patch? I am thinking to work on the project and found that you are already working on it. The last message is almost six months old. I want to check with you that are you still working on that, if yes I can help on that by reviewing the patch etc. If you are not working on that anymore, can you share your done work (if possible)?
> > --
> > Ibrar Ahmed
>
> We are currently developing for TDE and integration KMS.
> So, We will Also be prepared to start a new discussion with the PoC patch as soon as possible.
>
> At currently, we have changed the development direction of a per-Tablespace unit by per-table
> Also, currently researching how to associate with KMIP protocol related to the encryption key for integration with KMS.
> We talked about this in the Unconference session of PGConf.ASIA,
> And a week ago, we talked about the development direction of TDE and integration with KMS at FOSDEM PGDAY[1].
>
> We will soon provide PoC with new discussions.
>
> Regards.
>
> [1] TRANSPARENT DATA ENCRYPTION IN POSTGRESQL AND INTEGRATION WITH KEY MANAGEMENT SERVICES
> https://www.postgresql.eu/events/fosdem2019/schedule/session/2307-transparent-data-encryption-in-postgresql-and-integration-with-key-management-services/
>

Let me share the details of progress and current state.

As the our presentation slides describes I've written the PoC code for
transparent encryption that uses 2-tier key architecture and has the
key rotation feature. We've been discussed the design database
transparent encryption on -hackers so far and we found a good design
and implementation. I will share them with our research results. But I
think the design of integration of PostgreSQL with key management
services(KMS) is more controvertible.

For integration with KMS, I'm going to propose to add generic key
management APIs to PostgreSQL core so that it can communicate with
KMSs supporting different interfaces and protocols and can get the
master key (of 2-tier key architecture) from them. Users can choose a
key management plugin according to their enviornment.

The integration of PostgreSQL with KMS should be separated patch from
the TDE patch and we think that TDE can be done first. But at least
it's essential to provide a way to get the master key from an external
location. Therefore as the first step we can propose the basic
components of TDE with a simple interface to get the master key from
KMS rather than supporting full key management APIs. The basic
components of TDE that we're going to propose are:

* Transparent encryption at a layer between shared buffer and OS page cache
* Per tablespaces encryption
* 2-tier key architecture
* Key rotation
* System catalogs and temporary files encryption

WAL encryption will follow as an additional feature.

The simple interface to get the master key is a GUC parameter that can
store the shell command, say get_encryption_key_command. As its names
suggests, the command is used for only getting the master key, never
be used for removal and registration.

The slides explains about TDE feature in details but doesn't about KMS
much. So let me share a rough idea of using TDE in combination with
KMS.

2-Tier Key Architecture and Key Generation
=================================

In our design, we use 2-tier key architecuter which uses two types
keys: one master key and multiple data encryption keys. As the slides
explains details, the benefit of this architecture is the fast key
rotation. When the key rotation the data to re-encrypt is only data
encryption keys.

Key Generation Number is an integer value starting from 1, using for
identifying the master key. It's initialized at initdb time and
incremented whenever the master key is changed (i.g. key rotation).
For each key generation number we have multiple data encryption keys
associated with tablespaces. The current key generation number is
written to checkpoint records. When starting up, the startup process
executes the shell command set in get_encryption_key_command GUC
parameter with a key generation number.

For example, we can set something like get_encryption_key_command =
'/bin/sh get_key_from_kms.sh %g', where '%g' is replaced with the
current key generation number and where 'get_key_from_kms.sh' is an
arbitary shell script to get the master key from a KMS. I assume that
the master keys on KMS can be identified by its ID. So DBA generates a
master key identified by the key ID in a arbitary form on KMS
beforehand and the get_encryption_key_command has to crafts the key ID
in the same manner and pass to the KMS. The master key we got is
written to stdout.

Therefore, the contract between PostgreSQL and user is,
* User must prepare the master key identified by an unique key ID in advance
* The shell command crafts the key ID in the same form as key ID on KMS.
* User must remove old keys from KMS if necessary (because there is no
interface other than getting the master key)

Initial Setup and Recovery
====================

Since the user data could be encrypted we need the data encryption
keys and the master key even during recovery. The
get_encryption_key_command will be executed by the startup process
with the key generation number written in the checkpoint record, and
stores the master key to the shared memory.

For example, if we crafts the master key ID in the form of 'ABC_<key
generation number>', the operation steps from initdb to recovery will
be followings.

1. User creates the master key of first generation with ID 'ABC_1' on KMS
2. User executes initdb and sets get_encryption_key_command = '/bin/sh
get_key_from_kms.sh %g' in postgresql.conf
3. Start PostgreSQL
3-1. If transparent encryption is disabled or there is no
encrypted data on database go to step #4
3-2. The startup process executes '/bin/sh get_key_from_kms.sh 1'
because the current(initial) key generation is 1
3-3. The get_key_from_kms.sh crafts the key ID 'ABC_1' in the same
form and get the master key from KMS
3-4. If failed, raise a FATAL error
3-5. Store the master key to the shared memory
3-6. If there is data encryption key, decrypt them using the master key
4. Recovery starts

To make sure that we got the correct master key we can save the hash
value of master key on the database cluster and compare them.

Key Rotation
===========

When user requests key rotation (via SQL command or function), the
backends execute get_encryption_key_command with the new key
generation number. It re-encrypts all existing data encryption keys
with the new master key and increments the current key generation
number. Similar to initialization time, we need to prepare the new
master key on the KMS before executing the key rotation.

So for example, the operation steps will be like;

1. Create the second generation master key with key ID 'ABC_2' on KMS
2. Execute key rotation on PostgreSQL (calling
pg_rotate_encryption_key() function)
3-1. The backends execute '/bin/sh get_key_from_kms.sh 2', where 2
is the next key generation number
3-2. It crafts the key ID 'ABC_2' in the same manner and gets the
new master key from KMS
3-3. If failed, raise an error
3-4. Re-encrypt data encryption keys using the new master key
3-5. Increment the current key generation to 2

Of course some lockings are required here.

Integration with KMS
================

This above design has some restrictions on administration but might be
enough for a few use case. But I think that these inconviniences will
go way if we had KMS integration. Since KMIP supports some protocol
for key management such as key registration and key removal, the key
management plugin will be responsible for registrating the master key
and getting it using the key ID generated in an unified form. So what
user need to do are setting up KMS and setting up the key management
plugin. User no longer needs to create and remove the master key
manually.

Other use case of integrating with KMS
==============================

BTW we can have not only internal key management interfaces for TDE
feature but also SQL interface for existing use-cases such as using
pgcrypto. Currently we need to pass the password to encryption and
decryption functions.

SELECT decrypt(data, 'sercret-key', 'aes') FROM ...;

The password will be logged to the server log when log_statement =
'all'. But with KMSs it would become,

SELECT decrypt(data, get_encryption_key('keyid'), 'aes') FROM ...;

where get_encryption_key() function gets the encryption key from KMS
via the loaded plugin. The key string never be output to the server
logs.

We're still researching the details of KMIP and key managements APIs.
Will share the updates. Feedback is very welcome and we're open to new
idea.

Thank you for reading.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-02-07 08:26:14 phase out ossp-uuid?
Previous Message Sergei Kornilov 2019-02-07 08:14:54 Re: Allow some recovery parameters to be changed with reload