Re: Transparent column encryption

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Jelte Fennema-Nio <postgres(at)jeltef(dot)nl>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2024-04-18 11:24:59
Message-ID: 93991595-374a-4217-a988-287a23d1ad3f@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.04.24 16:14, Jelte Fennema-Nio wrote:
>> (The CEK can't be rotated easily, since
>> that would require reading out all the data from a table/column and
>> reencrypting it. We could/should add some custom tooling for that,
>> but it wouldn't be a routine operation.)
>
> This seems like something that requires some more thought because CEK
> rotation seems just as important as CMK rotation (often both would be
> compromised at the same time).

Hopefully, the reason for key rotation is mainly that policies require
key rotation, not that keys get compromised all the time. That's the
reason for having this two-tier key system in the first place. This
seems pretty standard to me. For example, I can change the password on
my laptop's file system encryption, which somehow wraps a lower-level
key, but I can't reencrypt the actual file system in place.

> + The plaintext inside
> + the ciphertext is always in text format, but this is invisible to the
> + protocol.
>
> It seems like it would be useful to have a way of storing the
> plaintext in binary form too. I'm not saying this should be part of
> the initial version, but it would be good to keep that in mind with
> the design.

Two problems here: One, for deterministic encryption, everyone needs to
agree on the representation, otherwise equality comparisons won't work.
Two, if you give clients the option of storing text or binary, then
clients also get back a mix of text or binary, and it will be a mess.
Just giving the option of storing the payload in binary wouldn't be that
hard, but it's not clear what you can sensibly do with that in the end.

> + The session-specific identifier of the key.
>
> Is it necessary for this identifier to be session-specific? Why not
> use a global identifier like an oid? Anything session specific makes
> the job of transaction poolers quite a bit harder. If this identifier
> would be global, then the message can be forwarded as is to the client
> instead of re-mapping identifiers between clients and servers (like is
> needed for prepared statements).

The point was just to avoid saying specifically that the OID will be
sent, because then that would tie the catalog representation to the
protocol, which seems unnecessary. Maybe we can reword that somehow.

In terms of connection pooling, this feature as it is conceived right
now would only work in session pooling anyway. Even if the identifiers
somehow were global (but OIDs can also change and are not guaranteed
unique forever), the state of which keys have already been sent is
session state.

> + Additional algorithms may be added to this protocol specification without a
> + change in the protocol version number.
>
> What's the reason for not requiring a version bump for this?

This is kind of like SASL or TLS can add new methods dynamically without
requiring a new version. I mean, as we are learning, making new
protocol versions is kind of hard, so the point was to avoid it.

> + If the protocol extension <literal>_pq_.column_encryption</literal> is
> + enabled (see <xref linkend="protocol-flow-column-encryption"/>), then
> + there is also the following for each parameter:
>
> It seems a little bit wasteful to include these for all columns, even
> the ones that don't require encryption. How about only adding these
> fields when format code is 0x11

I guess you could do that, but wouldn't that making the decoding of
these messages much more complicated? You would first have to read the
"short" variant, decode the format, and then decide to read the rest.
Seems weird.

> Finally, I'm trying to figure out if this really needs to be a
> protocol extension or if a protocol version bump would work as well
> without introducing a lot of work for clients/poolers that don't care
> about it (possibly with some modifications to the proposed protocol
> changes).

That's not something this patch cares about, but the philosophical
discussions in the other thread on protocol versioning etc. appear to
lean toward protocol extension.

> What makes this a bit difficult for me is that there's not
> much written in the documentation on what is supposed to happen for
> encrypted columns when the protocol extension is not enabled. Is the
> content just returned/written like it would be with a bytea?

Yes, that's what would happen, and that's the intention, so that for
example you can use pg_dump to back up encrypted columns without having
to decrypt them.

> A related question to this is that currently libpq throws an error if
> e.g. a master key realm is not defined but another one is. Is that
> really what we want? Is not having one of the realms really that
> different from not providing any realms at all?

Can you provide a more concrete example of what scenario you have a
concern about?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2024-04-18 11:25:23 Re: AIX support
Previous Message Sriram RK 2024-04-18 11:15:43 Re: AIX support