WIP: Data at rest encryption

From: Ants Aasma <ants(dot)aasma(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP: Data at rest encryption
Date: 2016-06-07 13:56:48
Message-ID: CA+CSw_tb3bk5i7if6inZFc3yyf+9HEVNTy51QFBoeUk7UE_V=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

I have been working on data-at-rest encryption support for PostgreSQL.
In my experience this is a common request that customers make. The
short of the feature is that all PostgreSQL data files are encrypted
with a single master key and are decrypted when read from the OS. It
does not provide column level encryption which is an almost orthogonal
feature, arguably better done client side.

Similar things can be achieved with filesystem level encryption.
However this is not always optimal for various reasons. One of the
better reasons is the desire for HSM based encryption in a storage
area network based setup.

Attached to this mail is a work in progress patch that adds an
extensible encryption mechanism. There are some loose ends left to tie
up, but the general concept and architecture is at a point where it's
ready for some feedback, fresh ideas and bikeshedding.

Usage
=====

Set up database like so:

(read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
export PGENCRYPTIONKEY
initdb -k -K pgcrypto $PGDATA )

Start PostgreSQL:

(read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
export PGENCRYPTIONKEY
postgres $PGDATA )

Design
======

The patch adds a new GUC called encryption_library, when specified the
named library is loaded before shared_preload_libraries and is
expected to register its encryption routines. For now the API is
pretty narrow, one parameterless function that lets the extension do
key setup on its own terms, and two functions for
encrypting/decrypting an arbitrary sized block of data with tweak. The
tweak should alter the encryption function so that identical block
contents are encrypted differently based on their location. The GUC
needs to be set at bootstrap time, so it gets set by a new option for
initdb. During bootstrap an encryption sample gets stored in the
control file, enabling useful error messages.

The library name is not stored in controldata. I'm not quite sure
about this decision. On one hand it would be very useful to tell the
user what he needs to get at his data if the configuration somehow
goes missing and it would get rid of the extra GUC. On the other hand
I don't really want to bloat control data, and the same encryption
algorithm could be provided by different implementations.

For now the encryption is done for everything that goes through md,
xlog and slru. Based on a review of read/write/fread/fwrite calls this
list is missing:

* BufFile - needs refactoring
* Logical reorder buffer serialization - probably needs a stream mode
cipher API addition.
* logical_heap_rewrite - can be encrypted as one big block
* 2PC state data - ditto
* pg_stat_statements - query texts get appended so a stream mode
cipher might be needed here too.

copydir needed some changes too because tablespace and database oid
are included in the tweak and so copying also needs to decrypt and
encrypt with the new tweak value.

For demonstration purposes I imported Brian Gladman's AES-128-XTS mode
implementation into pgcrypto and used an environment variable for key
setup. This part is not really in any reviewable state, the XTS code
needs heavy cleanup to bring it up to PostgreSQL coding standards,
keysetup needs something secure, like PBKDF2 or scrypt.

Performance with current AES implementation is not great, but not
horrible either, I'm seeing around 2x slowdown for larger than
shared_buffers, smaller than free memory workloads. However the plan
is to fix this - I have a prototype AES-NI implementation that does
3GB/s per core on my Haswell based laptop (1.25 B/cycle).

Open questions
==============

The main questions is what to do about BufFile? It currently provides
both unaligned random access and a block based interface. I wonder if
it would be a good idea to refactor it to be fully block based under
the covers.

I would also like to incorporate some database identifier as a salt in
key setup. However, system identifier stored in control file doesn't
fit this role well. It gets initialized somewhat too late in the
bootstrap process, and more importantly, gets changed on pg_upgrade.
This will make link mode upgrades impossible, which seems like a no
go. I'm torn whether to add a new value for this purpose (perhaps
stored outside the control file) or allow setting of system identifier
via initdb. The first seems like a better idea, the file could double
as a place to store additional encryption parameters, like key length
or different cipher primitive.

Regards,
Ants Aasma

Attachment Content-Type Size
data-at-rest-encryption-wip-2016.06.07.patch text/x-patch 92.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-06-07 14:19:59 Re: Reviewing freeze map code
Previous Message Tom Lane 2016-06-07 13:26:11 Re: slower connect from hostnossl clients