Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Stephen Frost <sfrost(at)snowman(dot)net>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-07-15 22:11:41
Message-ID: 20190715221141.rrrewsqxlx4umhja@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 15, 2019 at 11:05:30PM +0200, Tomas Vondra wrote:
> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
> > On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> > > One extra thing we should consider is authenticated encryption. We can't
> > > just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> > > as that does not provide integrity protection (i.e. can't detect when
> > > the ciphertext was corrupted due to disk failure or intentionally). And
> > > we can't quite rely on checksums, because that checksums the plaintext
> > > and is stored encrypted.
> >
> > Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> > the checksum (per-page or WAL) will not match our decrypted output. How
> > would they make it match the checksum without already knowing the key.
> > I read [1] but could not see that explained.
> >
>
> Our checksum is only 16 bits, so perhaps one way would be to just
> generate 64k of randomly modified pages and hope one of them happens to
> hit the right checksum value. Not sure how practical such attack is, but
> it does require just filesystem access.

Yes, that would work, and opens the question of whether our checksum is
big enough for this, and if it is not, we need to find space for it,
probably with a custom encrypted page format. :-( And that makes
adding encryption offline almost impossible because you potentially have
to move tuples around. Yuck!

> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
> a cryptographic hash algorithm. Now, maybe we don't want authenticated
> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).

I thought just encrypting the CRC value would be enough to detect
changes, but you are right that some you could just do 64k pages until
one hit.

> > This post discussed it:
> >
> > https://crypto.stackexchange.com/questions/202/should-we-mac-then-encrypt-or-encrypt-then-mac
> >
> > I realize in a new system we might prefer encrypt-then-mac, TLS and SSL
> > do it differently, and I don't think the security problems of
> > MAC-then-Encrypt apply to our use-case, e.g. API programming errors.
> >
> > If we want to go crazy, we could encrypt, assume zeros for the CRC,
> > compute the MAC and put it in the place of the CRC is, but then tools
> > that read CRC would see that as an error, so we don't want to go there.
> > Yes, crazy.
> >
> > > Which seems pretty annoying, because then the checksums won't verify
> > > data as sent to the storage system, and verify checksums would require
> > > access to all keys (how do you do that in offline mode?).
> >
> > Uh, the keys are stored in a PGDATA file --- seems simple enough, but we
> > would either have to do whole-cluster encryption or have some per-page
> > encryption flag.
> >
>
> And how do you know which files are encrypted and which are not, and
> which keys are used for which file? Presumably that's in some system
> catalog, which is not available in offline mode.

You would need either all-cluster encryption (no need to check) or a
per-page bit that says the page is encrypted, and the bit has to be in
the part of the page that is not encryped, e.g., near LSN.

> > > But the main issue with checksum-then-encrypt is it's essentially
> > > "MAC-then-Encrypt" and that does not provide Authenticated Encryption
> > > security - see [1]. We should be looking at "Encrypt-then-MAC" instead,
> > > in which case we'll need to store the MAC somewhere (probably in the
> > > same place as the nonce/IV/key/... for each page).
> >
> > I don't think we are planning to store the nonce/IV on each page but
> > rather use the LSN (already on the page), and perhaps in addition, the
> > page number.
>
> But the LSN is in the page header, and AFAICS the page header is
> encrypted. So how do you decrypt the page without knowing the LSN (which
> I think you need to know in otder to derive the IV)?

My poposal was that the first 16 bytes of the page are not encrypted.

> Also, we probably don't want to expose the checksum, because it may
> reveal information about page contents (since it's not a HMAC).

Uh, I have not heard of that as an issue.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2019-07-15 22:12:19 Use PageIndexTupleOverwrite() within nbtsort.c
Previous Message Tom Lane 2019-07-15 22:10:00 Re: POC: converting Lists into arrays