Re: Transparent column encryption

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2023-03-30 18:35:49
Message-ID: ZCXWhd25WNJ4JgQn@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

* Andres Freund (andres(at)anarazel(dot)de) wrote:
> On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote:
> > On 30.03.23 03:29, Andres Freund wrote:
> > > > One might think that, but the precedent in other equivalent systems is that
> > > > you reference the key and the algorithm separately. There is some
> > > > (admittedly not very conclusive) discussion about this near [0].
> > > >
> > > > [0]: https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40
> > >
> > > I'm very much not convinced by that. Either way, there at least there should
> > > be a comment mentioning that we intentionally try to allow that.
> > >
> > > Even if this feature is something we want (why?), ISTM that this should not be
> > > implemented by having multiple fields in pg_attribute, but instead by a table
> > > referenced by by pg_attribute.attcek.
> >
> > I don't know if it is clear to everyone here, but the key data model and the
> > surrounding DDL are exact copies of the equivalent MS SQL Server feature.
> > When I was first studying it, I had the exact same doubts about this. But
> > as I was learning more about it, it does make sense, because this matches a
> > common pattern in key management systems, which is relevant because these
> > keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it
> > is plausible that those people knew what they were doing, and 2) it would be
> > preferable to maintain alignment and not create something that looks the
> > same but is different in some small but important details.

I was wondering about this- is it really exactly the same, down to the
point that there's zero checking of what the data returned actually is
after it's decrypted and given to the application, and if it actually
matches the claimed data type?

> I find it very hard to belief that details of the catalog representation like
> this will matter to users. How would would it conceivably affect users that we
> store (key, encryption method) in pg_attribute vs storing an oid that's
> effectively a foreign key reference to (key, encryption method)?

I do agree with this.

> > > > With the proposed removal of usertypmod, it's only two fields: the link to
> > > > the key and the user-facing type.
> > >
> > > That feels far less clean. I think loosing the ability to set the precision of
> > > a numeric, or the SRID for postgis datums won't be received very well?
> >
> > In my mind, and I probably wasn't explicit about this, I'm thinking about
> > what can be done now versus later.
> >
> > The feature is arguably useful without typmod support, e.g., for text. We
> > could ship it like that, then do some work to reorganize pg_attribute and
> > tuple descriptors to relieve some pressure on each byte, and then add the
> > typmod support back in in a future release. I think that is a workable
> > compromise.
>
> I doubt that shipping a version of column encryption that breaks our type
> system is a good idea.

And this.

I do feel that column encryption is a useful capability and there's
large parts of this approach that I agree with, but I dislike the idea
of having our clients be able to depend on what gets returned for
non-encrypted columns while not being able to trust what encrypted
column results are and then trying to say it's 'transparent'. To that
end, it seems like just saying they get back a bytea and making it clear
that they have to provide the validation would be clear, while keeping
much of the rest. Expanding out from that I'd imagine, pie-in-the-sky
and in some far off land, having our data type in/out validation
functions moved to the common library and then adding client-side
validation of the data going in/out of the encrypted columns would allow
application developers to be able to trust what we're returning (as long
as they're using libpq- and we'd have to document that independent
implementations of the protocol have to provide this or just continue to
return bytea's).

Not sure how we'd manage to provide support for extensions though.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-03-30 18:44:30 Re: pgsql: Clean up role created in new subscription test.
Previous Message Robert Haas 2023-03-30 18:19:40 Re: pgsql: Clean up role created in new subscription test.