Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-07-15 21:05:30
Message-ID: 20190715210530.aygejkdzelfpyw4u@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
>On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
>> One extra thing we should consider is authenticated encryption. We can't
>> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
>> as that does not provide integrity protection (i.e. can't detect when
>> the ciphertext was corrupted due to disk failure or intentionally). And
>> we can't quite rely on checksums, because that checksums the plaintext
>> and is stored encrypted.
>
>Uh, if someone modifies a few bytes of the page, we will decrypt it, but
>the checksum (per-page or WAL) will not match our decrypted output. How
>would they make it match the checksum without already knowing the key.
>I read [1] but could not see that explained.
>

Our checksum is only 16 bits, so perhaps one way would be to just
generate 64k of randomly modified pages and hope one of them happens to
hit the right checksum value. Not sure how practical such attack is, but
it does require just filesystem access.

FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
a cryptographic hash algorithm. Now, maybe we don't want authenticated
encryption (e.g. XTS is not authenticated, unlike GCM/CCM).

>This post discussed it:
>
> https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
>
>I realize in a new system we might prefer encrypt-then-mac, TLS and SSL
>do it differently, and I don't think the security problems of
>MAC-then-Encrypt apply to our use-case, e.g. API programming errors.
>
>If we want to go crazy, we could encrypt, assume zeros for the CRC,
>compute the MAC and put it in the place of the CRC is, but then tools
>that read CRC would see that as an error, so we don't want to go there.
>Yes, crazy.
>
>> Which seems pretty annoying, because then the checksums won't verify
>> data as sent to the storage system, and verify checksums would require
>> access to all keys (how do you do that in offline mode?).
>
>Uh, the keys are stored in a PGDATA file --- seems simple enough, but we
>would either have to do whole-cluster encryption or have some per-page
>encryption flag.
>

And how do you know which files are encrypted and which are not, and
which keys are used for which file? Presumably that's in some system
catalog, which is not available in offline mode.

>> But the main issue with checksum-then-encrypt is it's essentially
>> "MAC-then-Encrypt" and that does not provide Authenticated Encryption
>> security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
>> in which case we'll need to store the MAC somewhere (probably in the
>> same place as the nonce/IV/key/... for each page).
>
>I don't think we are planning to store the nonce/IV on each page but
>rather use the LSN (already on the page), and perhaps in addition, the
>page number.

But the LSN is in the page header, and AFAICS the page header is
encrypted. So how do you decrypt the page without knowing the LSN (which
I think you need to know in otder to derive the IV)?

Also, we probably don't want to expose the checksum, because it may
reveal information about page contents (since it's not a HMAC).

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-07-15 21:51:08 Re: Creating partitions automatically at least on HASH?
Previous Message Fabien COELHO 2019-07-15 21:03:45 RE: minimizing pg_stat_statements performance overhead