Re: Transparent column encryption

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2023-03-30 20:10:02
Message-ID: b12ab6ca-5a87-e79f-f03b-02b42c42e6bd@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 30.03.23 20:35, Stephen Frost wrote:
> I do feel that column encryption is a useful capability and there's
> large parts of this approach that I agree with, but I dislike the idea
> of having our clients be able to depend on what gets returned for
> non-encrypted columns while not being able to trust what encrypted
> column results are and then trying to say it's 'transparent'. To that
> end, it seems like just saying they get back a bytea and making it clear
> that they have to provide the validation would be clear, while keeping
> much of the rest.

[Note that the word "transparent" has been removed from the feature
name. I just didn't change the email thread name.]

These thoughts are reasonable, but I think there is a tradeoff to be
made between having featureful data validation and enhanced security.
If you want your database system to validate your data, you have to send
it in plain text. If you want to have your database system not see the
plain text, then it cannot validate it. But there is still utility in it.

You can't really depend on what gets returned even in the non-encrypted
case, unless you cryptographically sign the schema against modification
or something like that. So realistically, a client that cares strongly
about the data it receives has to do some kind of client-side validation
anyway.

Note also that higher-level client libraries like JDBC effectively do
client-side data validation, for example when you call
ResultSet.getInt() etc.

This is also one of the reasons why the user facing type declaration
exists. You could just make all encrypted columns of type "opaque" or
something and not make any promises about what's inside. But client
APIs sort or rely on the application being able to ask the result set
for what's inside a column value. If we just say, we don't know, then
applications (or driver APIs) will have to be changed to accommodate
that, but the intention was to not require that. So instead we say,
it's supposed to be int, and then if it's sometimes actually not int,
then your application throws an exception you can deal with. This is
arguably a better developer experience, even if it concerns the data
type purist.

But do you have a different idea about how it should work?

> Expanding out from that I'd imagine, pie-in-the-sky
> and in some far off land, having our data type in/out validation
> functions moved to the common library and then adding client-side
> validation of the data going in/out of the encrypted columns would allow
> application developers to be able to trust what we're returning (as long
> as they're using libpq- and we'd have to document that independent
> implementations of the protocol have to provide this or just continue to
> return bytea's).

As mentioned, some client libraries effectively already do that. But
even if we could make this much more comprehensive, I don't see how this
can ever actually satisfy your point. It would require that all
participating clients apply validation all the time, and all other
clients can rely on that happening, which is impossible.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-30 20:29:36 Re: pgsql: Clean up role created in new subscription test.
Previous Message Andrew Dunstan 2023-03-30 20:09:02 Re: Thoughts on using Text::Template for our autogenerated code?