Re: Transparent column encryption

From: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2024-04-10 14:14:51
Message-ID: CAGECzQTT0nsypx0HFfNF6OWn_aUvKobieShZawJQdH3_F=Q3gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 10 Apr 2024 at 12:13, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>
> To kick some things off for PG18, here is an updated version of the
> patch for automatic client-side column-level encryption.

I only read the docs and none of the code, but here is my feedback on
the current design:

> (The CEK can't be rotated easily, since
> that would require reading out all the data from a table/column and
> reencrypting it. We could/should add some custom tooling for that,
> but it wouldn't be a routine operation.)

This seems like something that requires some more thought because CEK
rotation seems just as important as CMK rotation (often both would be
compromised at the same time). As far as I can tell the only way to
rotate a CEK is by re-encrypting the column for all rows in a single
go at the client side, thus taking a write-lock on all rows of the
table. That seems quite problematic, because that makes key rotation
an operation that requires application downtime. Allowing online
rotation is important, otherwise almost no-one will do it preventative
at a regular interval.

One way to allow online CEK rotation is by allowing a column to be
encrypted by one of several keys and/or allow a key to have multiple
versions. And then for each row we would store which key/version it
was encrypted with. That way for new insertions/updates clients would
use the newest version. But clients would still be able to decrypt
both old rows with the old key and new rows encrypted with the new
key, because the server would give them both keys and tell which row
was encrypted with which. Then the old rows can be rewritten by a
client in small batches, so that writes to the table can keep working
while this operation takes place.

This could even be used to allow encrypting previously unencrypted
columns using something like "ALTER COLUMN mycol ENCRYPTION KEY cek1".
Then unencrypted rows could be indicated by e.g. returning something
like NULL for the CEK.

+ The plaintext inside
+ the ciphertext is always in text format, but this is invisible to the
+ protocol.

It seems like it would be useful to have a way of storing the
plaintext in binary form too. I'm not saying this should be part of
the initial version, but it would be good to keep that in mind with
the design.

+ The session-specific identifier of the key.

Is it necessary for this identifier to be session-specific? Why not
use a global identifier like an oid? Anything session specific makes
the job of transaction poolers quite a bit harder. If this identifier
would be global, then the message can be forwarded as is to the client
instead of re-mapping identifiers between clients and servers (like is
needed for prepared statements).

+ Additional algorithms may be added to this protocol specification without a
+ change in the protocol version number.

What's the reason for not requiring a version bump for this?

+ If the protocol extension <literal>_pq_.column_encryption</literal> is
+ enabled (see <xref linkend="protocol-flow-column-encryption"/>), then
+ there is also the following for each parameter:

It seems a little bit wasteful to include these for all columns, even
the ones that don't require encryption. How about only adding these
fields when format code is 0x11

Finally, I'm trying to figure out if this really needs to be a
protocol extension or if a protocol version bump would work as well
without introducing a lot of work for clients/poolers that don't care
about it (possibly with some modifications to the proposed protocol
changes). What makes this a bit difficult for me is that there's not
much written in the documentation on what is supposed to happen for
encrypted columns when the protocol extension is not enabled. Is the
content just returned/written like it would be with a bytea? Or is
writing disallowed because the format code would never be set to 0x11.

A related question to this is that currently libpq throws an error if
e.g. a master key realm is not defined but another one is. Is that
really what we want? Is not having one of the realms really that
different from not providing any realms at all?

But no-matter these behavioural details, I think it would be fairly
easy to add minimal "non-support" for this feature while supporting
the new protocol messages. All they would need to do is understand
what the new protocol messages/fields mean and either ignore them or
throw a clear error. For poolers it's a different story however. For
transaction pooling there's quite a bit of work to be done. I already
mentioned the session-specific ID being a problem, but even assuming
we change that to a global ID there's still difficulties. Key
information is only sent by the server if it wasn't sent before in the
session[1], so a pooler would need to keep it's own cache and send
keys to clients that haven't received them yet.

So yeah, I think it would make sense to put this behind a protocol
extension feature flag, since it's fairly niche and would require
significant work at the pooler side to support.

[1]:
+ When automatic client-side column-level encryption is enabled, the
+ messages ColumnMasterKey and ColumnEncryptionKey can appear before
+ RowDescription and ParameterDescription messages. Clients should collect
+ the information in these messages and keep them for the duration of the
+ connection. A server is not required to resend the key information for
+ each statement cycle if it was already sent during this connection.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-10 14:15:29 Re: post-freeze damage control
Previous Message Alvaro Herrera 2024-04-10 13:58:49 Re: Can't find not null constraint, but \d+ shows that