Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

From: Antonin Houska <ah(at)cybertec(dot)at>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Joe Conway <mail(at)joeconway(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, "Moon, Insung" <Moon_Insung_i3(at)lab(dot)ntt(dot)co(dot)jp>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Date: 2019-07-19 14:02:19
Message-ID: 8930.1563544939@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:

> On Fri, Jul 19, 2019 at 12:04:36PM +0200, Antonin Houska wrote:
> >Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
> >
> >> On Mon, Jul 15, 2019 at 03:42:39PM -0400, Bruce Momjian wrote:
> >> >On Sat, Jul 13, 2019 at 11:58:02PM +0200, Tomas Vondra wrote:
> >> >> One extra thing we should consider is authenticated encryption. We can't
> >> >> just encrypt the pages (no matter which AES mode is used - XTS/CBC/...),
> >> >> as that does not provide integrity protection (i.e. can't detect when
> >> >> the ciphertext was corrupted due to disk failure or intentionally). And
> >> >> we can't quite rely on checksums, because that checksums the plaintext
> >> >> and is stored encrypted.
> >> >
> >> >Uh, if someone modifies a few bytes of the page, we will decrypt it, but
> >> >the checksum (per-page or WAL) will not match our decrypted output. How
> >> >would they make it match the checksum without already knowing the key.
> >> >I read [1] but could not see that explained.
> >> >
> >>
> >> Our checksum is only 16 bits, so perhaps one way would be to just
> >> generate 64k of randomly modified pages and hope one of them happens to
> >> hit the right checksum value. Not sure how practical such attack is, but
> >> it does require just filesystem access.
> >
> >I don't think you can easily generate 64k of different checksums this way. If
> >the data is random, I suppose that each set of 2^(128 - 16) blocks will
> >contain the the same checksum after decryption. Thus even you generate 64k of
> >different ciphertext blocks that contain the checksum, some (many?) checksums
> >will be duplicate. Unfortunately the math to describe this problem does not
> >seem to be trivial.
> >
>
> I'm not sure what's your point, or why you care about the 128 bits, but I
> don't think the math is very complicated (and it's exactly the same with
> or without encryption). The probability of checksum collision for randomly
> modified page is 1/64k, so p=~0.00153%. So probability of *not* getting a
> collision is (1-p)=99.9985%. So with N pages, the probability of no
> collisions is pow((1-p),N) which behaves like this:
>
> N pow((1-p),N)
> --------------------
> 10000 85%
> 20000 73%
> 30000 63%
> 46000 49%
> 200000 4%
>
> So with 1.6GB relation you have about 96% chance of a checksum collision.

I thought your attack proposal was to find valid (encrypted) checksum for a
given encrypted page. Instead it seems that you were only trying to say that
it's not too hard to generate page with a valid checksum in general. Thus the
attacker can try to modify the ciphertext again and again in a way that is not
quite random, but the chance to pass the checksum verification may still be
relatively high.

> >Also note that if you try to generate ciphertext, decryption of which will
> >result in particular value of checksum, you can hardly control the other 14
> >bytes of the block, which in turn are used to verify the checksum.
> >
>
> Now, I'm not saying this attack is particularly practical - it would
> generate a fair number of checkpoint failures before getting the first
> collision. So it'd trigger quite a few alerts, I guess.

You probably mean "checksum failures". I agree. And even if the checksum
passed the verification, page or tuple headers would probably be incorrect and
cause other errors.

> >> FWIW our CRC algorithm is not quite HMAC, because it's neither keyed nor
> >> a cryptographic hash algorithm. Now, maybe we don't want authenticated
> >> encryption (e.g. XTS is not authenticated, unlike GCM/CCM).
> >
> >I'm also not sure if we should try to guarantee data authenticity /
> >integrity. As someone already mentioned elsewhere, page MAC does not help if
> >the whole page is replaced. (An extreme case is that old filesystem snapshot
> >containing the whole data directory is restored, although that will probably
> >make the database crash soon.)
> >
> >We can guarantee integrity and authenticity of backup, but that's a separate
> >feature: someone may need this although it's o.k. for him to run the cluster
> >unencrypted.
> >
>
> Yes, I do agree with that. I think attempts to guarantee data authenticity
> and/or integrity at the page level is mostly futile (replay attacks are an
> example of why). IMHO we should consider that to be outside the threat
> model TDE is expected to address.

When writing my previous email I forgot that, besides improving data
integrity, the authenticated encryption also tries to detect an attempt to get
encryption key via "chosen-ciphertext attack (CCA)". The fact that pages are
encrypted / decrypted independent from each other should not be a problem
here. We just need to consider if this kind of CCA is the threat we try to
protect against.

> IMO a better way to handle authenticity/integrity would be based on WAL,
> which is essentially an authoritative log of operations. We should be able
> to parse WAL, deduce expected state (min LSN, checksums) for each page,
> and validate the cluster state based on that.

ok. A replica that was cloned from the master before any corruption could have
happened can be used for such checks. But that should be done by an external
tool rather than by PG core.

> I still think having to decrypt the page in order to verify a checksum
> (because the header is part of the encrypted page, and is computed from
> the plaintext version) is not great.

Should we forbid the checksums if the cluster is encrypted? Even if the
checksum is encrypted, I think it can still help to detect I/O corruption: if
the encrypted data is corrupted, then the checksum verification should fail
after decryption anyway.

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Liudmila Mantrova 2019-07-19 14:30:09 Re: Support for jsonpath .datetime() method
Previous Message Robert Haas 2019-07-19 13:43:54 Re: [HACKERS] advanced partition matching algorithm for partition-wise join