Re: Key management with tests

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Tom Kincaid <tomjohnkincaid(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: Key management with tests
Date: 2021-02-02 00:47:57
Message-ID: 20210202004757.GA18043@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 1, 2021 at 06:31:32PM -0500, Stephen Frost wrote:
> * Bruce Momjian (bruce(at)momjian(dot)us) wrote:
> > The purpose of cluster file encryption is to prevent users with read
> > access to the directories used to store database files and write-ahead
> > log files from being able to access the data stored in those files.
> > For example, when using cluster file encryption, users who have read
> > access to the cluster directories for backup purposes will not be able
> > to decrypt the data stored in these files. It also protects against
> > decrypted data access after media theft.
>
> That's one valid use-case and it particularly makes sense to consider,
> now that we support group read-access to the data cluster. The last

Do enough people use group read-access to be useful?

> line seems a bit unclear- I would update it to say:
> Cluster file encryption also provides data-at-rest security, protecting
> users from data loss should the physical media on which the cluster is
> stored be stolen, improperly deprovisioned (not wiped or destroyed), or
> otherwise ends up in the hands of an attacker.

I have split the section into three paragraphs, trimmed down some of the
suggested text, and added it. Full version below.

> > File system write access can allow for unauthorized file system data
> > decryption if the writes can be used to weaken the system's security
> > and this weakened system is later supplied with externally-stored keys.
>
> This isn't very clear as to exactly what the concern is or how an
> attacker would be able to thwart the system if they had write access to
> it. An attacker with write access could possibly attempt to replace the
> existing keys, but with the key wrapping that we're using, that should
> result in just a decryption failure (unless, of course, the attacker has
> the actual KEK that was used, but that's not terribly interesting to
> worry about since then they could just go access the files directly).

Uh, well, they could modify postgresql.conf to change the script to save
the secret returned by the script before returning it to the PG server.
We could require postgresql.conf to be somewhere secure, but then how do
we know that is secure? I just don't see a clean solution here, but the
idea that you write and then wait for the key to show up seems like a
very valid way of attack, and it took me a while to be able to
articulate it.

> Until and unless we solve the issue around storing the GCM tags for each
> page, we will have the risk that an attacker could modify a page in a
> manner that we wouldn't detect. This is the biggest concern that I have
> currently with the existing TDE patch sets.

Well, GCM certainly can detect page modification, but it can't detect
removing pages from the end of the table, or, since the nonce is
LSN/pageno, you could copy a page from another table that has the same
offset into another table, particularly with partitioning where the
tables have the same columns. We might be able to protect against the
later with some kind of table-id in the nonce, but I don't see how table
truncation can be detected without adding a whole lot of overhead and
complexity. And if we can't protect against those two, why bother with
detecting single-page modifications? We have to do a full job for it to
be useful.

> There's two options that I see around how to address that issue- either
> we arrange to create space in the page for the tag, such as by making
> the 'special' space on a page a bit bigger and making sure that
> everything understands that, or we'll need to add another fork in which
> we store the tags (and possibly other TDE/encryption related
> information). If we go with a fork then it should be possible to do WAL
> streaming from an unencrypted cluster to an encrypted one, which would
> be pretty neat, but it means another fork and another page that has to
> be read/written every time we modify a page. Getting some input into
> the trade-offs here would be really helpful. I don't think it's really
> reasonable to go out with TDE without having figured out the integrity
> side. Certainly, when I review things like NIST 800-53, it's very clear
> that the requirement is for both confidentiality *and* integrity.

Wow, well, if they are both required, and we can't do both, is it
valuable to do just one? Yes, we can do something later, but what if we
have no idea how to implement the second part? Your fork idea above
might need to store some table-id used for the nonce (to prevent copying
from another table) and the number of pages in the table, which fixes
the integrity check issue, but adds a lot of complexity and perhaps
overhead.

> > This also does not protect from users who have read access to system
> > memory. This also does not detect or protect against users with write
> > access from removing or modifying database files.
>
> The last seems a bit obvious, but the first sentence quoted above is
> important to make clear. I might even say:
>
> All of the pages in memory and all of the keys which are used for the
> encryption and decryption are stored in the clear in memory and
> therefore an attacker who is able to read the memory allocated by
> PostgreSQL would be able to decrypt the enitre cluster.

Same as above, full version below.

> > Given what I said above, is the value of this feature for compliance, or
> > for actual additional security? If it just compliance, are we willing
> > to add all of this code just for that, even if it has limited security
> > value? We should answer this question now, and if we don't want it,
> > let's document that so users know and can consider alternatives.
>
> The feature is for both compliance and additional security. While there
> are other ways to achieve data-at-rest encryption, they are not always
> available, for a variety of reasons.

True.

> > FYI, I don't think we can detect or protect against writers modifying
> > the data files --- even if we could do it on a block level, they could
> > remove trailing pages (might cause index lookup failures) or copy
> > pages from other tables at the same offset. Therefore, I think we can
> > only offer viewing security, not modification detection/prevention.
>
> Protecting against file modification isn't about finding some way to
> make it so that an attacker isn't able to modify the files, it's about
> detecting the case where an unauthorized modification has happened.
> Clearly if an attacker has gained write access to the system then we
> can't protect against the attacker using the access they've gained, but
> we can in most cases detect it and that's what we should be doing. It
> would be really unfortunate to end up with a solution here that only
> provides confidentiality and doesn't address integrity at all, and I
> don't really think it's *that* hard to do both. That said, if we must
> work at this in pieces and we can get agreement to handle
> confidentiality initially and then add integrity later, that might be
> reasonable.

See above.

> > > I'm happy to go look over the KMS patches again if that'd be helpful and
> > > to comment on the TDE PoC. I can also spend some time trying to improve
> >
> > I think we eventually need a full review of the TDE PoC, combined with
> > the Cybertec patch, and the wiki, to get them all aligned. However, as
> > I said already, let's get the KMS patch approved, even if we don't apply
> > it now, so we know we are on an approved foundation.
>
> While the Cybertec patch is interesting, I'd really like to see
> something that's a bit less invasive when it comes to how temporary
> files are handled. In particular, I think it'd be possible to have an
> API that's very similar to the existing one for serial reading and
> writing of files which wouldn't require nearly as many changes to things
> like reorderbuffer.c. I also believe there's some things we could do to
> avoid having to modify quite as many places when it comes to LSN
> assignment, so the base patch isn't as big.

Yes, I think we would get the best ideas from all patches.

> > > on each, as I've already done. A few of the larger concerns that I have
> > > revolve around how to store integrity information (I've tried to find a
> > > way to make room for such information in our existing page layout and,
> > > perhaps unsuprisingly, it's far from trivial to do so in a way that will
> > > avoid breaking the existing page layout, or where the same set of
> > > binaries could work on both unencrypted pages and encrypted pages with
> > > integrity validation information, and that's a problem that we really
> >
> > As stated above, I think we only need a byte or two for the hint bit
> > counter (used in the IV), as I don't think the GCM verification bytes
> > will add any additional security, and I bet we can find a byte or two.
> > We do need a separate discussion on this, either here or privately.
>
> I have to disagree here- the GCM tag adds integrity which is really
> quite important. Happy to chat about it independently, of course.

Yeah, see above.

> > > should consider trying to solve...), and how to automate key rotation
> > > (one of the nice things about Bruce's approach to storing the keys is
> > > that we're leveraging the filesystem as an index- it's easy to see how
> > > we might extend the key-per-file approach to allow us to, say, have a
> > > different key for every 32GB of LSN, but if we tried to put all of the
> > > keys into a single file then we'd have to figure out an indexing
> > > solution for it which would allow us to find the key we need to decrypt
> > > a given page...). I tend to agree with Bruce that we need to take
> >
> > Yeah, yuck on that plan. I was very happy how the per-version directory
> > worked with scripts that needed to store matching state.
>
> I don't know that it's going to ultimately be the best answer, as we're
> essentially using the filesystem as an index, as I mentioned above, but,
> yeah, trying to do all of that ourselves during WAL replay doesn't seem
> like it would be fun to try and figure out. This is an area that I
> would think we'd be able to improve on in the future too- if someone
> wants to spend the time coming up with a single-file format that is
> indexed in some manner and still provides the guarantees that we need,
> we could very likely teach pg_upgrade how to handle that and the data
> set we're talking about here is quite small, even if we've got a bunch
> of key rotation that's happened.

I thought we were going to use failover to a standby as our data key
rotation method.

Here is the full doc part you wanted improved:

The purpose of cluster file encryption is to prevent users with read
access to the directories used to store database files and write-ahead
log files from being able to access the data stored in those files.
For example, when using cluster file encryption, users who have read
access to the cluster directories for backup purposes will not be able
to decrypt the data stored in these files. It also provides data-at-rest
security, protecting users from data loss should the physical storage
media be stolen or improperly erased before disposal.

File system write access can allow for unauthorized file system data
decryption if the writes can be used to weaken the system's security
and this weakened system is later supplied with externally-stored keys.
This also does not always detect if users with write access remove or
modify database files.

This also does not protect from users who have read access to system
memory &mdash; all in-memory data pages and data encryption keys are
stored unencrypted in memory, so an attacker who is able to read the
PostgreSQL process's memory can decrypt the entire cluster.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

The usefulness of a cup is in its emptiness, Bruce Lee

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2021-02-02 00:55:57 Re: Support for NSS as a libpq TLS backend
Previous Message tsunakawa.takay@fujitsu.com 2021-02-02 00:44:35 RE: Commitfest 2021-01 is now closed.