Re: Transparent column encryption

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent column encryption
Date: 2022-07-26 23:19:38
Message-ID: d3afc407-973f-dc0e-4776-65678f31ff8e@timescale.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/26/22 13:25, Robert Haas wrote:
> I certainly have no objection to being clear about such details in the
> documentation.

Cool.

> I fear the phenomenon where
> doing anything about a problem makes you responsible for the whole
> problem. If we disclaim the ability to hide the length of values,
> that's clear enough.

I don't think disclaiming responsibility absolves you of it here, in
part because choices are being made (text format) that make length
hiding even more important than it otherwise would be. A user who
already knows that encryption doesn't hide length might still reasonably
expect a fixed-length column type like bigint to be protected in all
cases. It won't be (at least not with your 16-byte example).

And sure, you can document that caveat too, but said user might then
reasonably wonder how they're supposed to actually make it safe.

> But if we start padding to try to hide the length
> of values, then people might expect it to work in all cases, and I
> don't see how it ever can.

Well, that's where I agree with you on the value of solid documentation.
But there are other things we can do as well. In general we should

- choose a default that will protect most people out of the box,
- document the heck out of the default's limitations,
- provide guardrails that warn the user when they're outgrowing those
limitations, and
- give people a way to tune it to their own use cases.

As an example, a naive guardrail in this instance could be to simply
have the client refuse to encrypt data past the padding maximum, if
you've gone so far as to set one up. It'd suck to hit that maximum in
production and have to rewrite the column, but you did want your
encryption to hide your data, right?

Maybe that's way too complex to think about for a v1, but it'll be
easier to maintain this into the future if there's at least a plan to
create a v2. If you declare it out of scope, instead of considering it a
potential TODO, then I think it'll be a lot harder for people to improve it.

> Moreover, I think that the padding might
> need to be done in a "cryptographically intelligent" way rather than
> just, say, adding trailing blanks.

Possibly. I think that's where AEAD comes in -- if you've authenticated
your ciphertext sufficiently, padding oracles should be prohibitively
difficult(?). (But see below; I think we also have other things to worry
about in terms of authentication and oracles.)

> Now that being said, if Peter wants
> to implement something around padding that he has reason to believe
> will not create cryptographic weaknesses, I have no issue with that. I
> just don't view it as an essential part of the feature, because hiding
> such things doesn't seem like it can ever be the main point of a
> feature like this.

I think that side channel consideration has to be an essential part of
any cryptography feature. Recent history has shown "obscure" side
channels gaining power to the point of completely breaking crypto schemes.

And it's not like TLS where we have to protect an infinite stream of
arbitrary bytes; this is going to be used on small values that probably
get repeated often and have (comparatively) very little entropy.
Cryptanalysis based on length seems to me like part and parcel of the
problem space.

> I guess my view on this is that, if you're trying to hide something
> like a credit card number, most likely every value in the system is
> the same length, and then this is a non-issue.

Agreed.

> On the other hand, if
> the secret column is a person's name, then there is an issue, but
> you're not going to pad every value out the maximum length of a
> varlena, so you have to make an estimate of how long a name someone
> might reasonably have to decide how much padding to include. You also
> have to decide whether the storage cost of padding every value is
> worth it to you given the potential information leakage. Only the
> human user can make those decisions, so some amount of "putting that
> on the user" feels inevitable.

Agreed.

> Now, if we don't have a padding system
> built into the feature, then that does put even more on the user; it's
> hard to argue with that.
Right. If they can even fix it at all. Having a well-documented padding
feature would not only help mitigate that, it would conveniently hang a
big sign on the caveats that exist.

--

Speaking of oracles and side channels. Users may want to use associated
data to further lock an encrypted value to its column type, too.
Otherwise it seems like an active DBA could feed an encrypted text blob
to a client in place of, say, an int column, to see whether or not that
text blob is a number. Seems like AD is going to be important to prevent
active attacks in general.

--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2022-07-26 23:20:19 Re: [Commitfest 2022-07] Patch Triage: Waiting on Author
Previous Message Thomas Munro 2022-07-26 23:15:03 Re: Is RecoveryConflictInterrupt() entirely safe in a signal handler?