Quick Links

Re: Moving forward with TDE

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Chris Travers <chris(dot)travers(at)gmail(dot)com>, David Christensen <david+pg(at)pgguru(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Moving forward with TDE
Date:	2023-03-27 22:57:42
Message-ID:	CAOuzzgqvzgWiQfc97dFUE6q3G0vtZO4nZCo-OUCRmG-gU3+KxA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Greetings,

On Mon, Mar 27, 2023 at 18:17 Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, Mar 28, 2023 at 12:01:56AM +0200, Stephen Frost wrote:
> > Greetings,
> >
> > On Mon, Mar 27, 2023 at 12:38 Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > On Wed, Mar 8, 2023 at 04:25:04PM -0500, Stephen Frost wrote:
> > > Agreed, though the latest efforts include an option for
> *authenticated*
> > > encryption as well as unauthenticated. That makes it much more
> > > difficult to make undetected changes to the data that's protected
> by
> > > the authenticated encryption being used.
> >
> > I thought some more about this. GCM-style authentication of
> encrypted
> > data has value because it assumes the two end points are secure but
> that
> > a malicious actor could modify data during transfer. In the Postgres
> > case, it seems the two end points and the transfer are all in the
> same
> > place. Therefore, it is unclear to me the value of using GCM-style
> > authentication because if the GCM-level can be modified, so can the
> end
> > points, and the encryption key exposed.
> >
> >
> > What are the two end points you are referring to and why don’t you feel
> there
> > is an opportunity between them for a malicious actor to attack the
> system?
>
> Uh, TLS can use GCM and in this case you assume the sender and receiver
> are secure, no?

TLS does use GCM.. pretty much exclusively as far as I can recall. So do a
lot of other things though..

> There are simpler cases to consider than an online attack on a single
> > independent system where an attacker having access to modify the data in
> > transit between PG and the storage would imply the attacker also having
> access
> > to read keys out of PG’s memory.
>
> I consider the operating system and its processes as much more of a
> single entity than TLS over a network.

This may be the case sometimes but there’s absolutely no shortage of other
cases and it’s almost more the rule these days, that there is some kind of
network between the OS processes and the storage- a SAN, an iSCSI network,
NFS, are all quite common.

> As specific examples, consider:
> >
> > An attack against the database system where the database server is shut
> down,
> > or a backup, and the encryption key isn’t available on the system.
> >
> > The backup system itself, not running as the PG user (an option
> supported by PG
> > and at least pgbackrest) being compromised, thus allowing for injection
> of
> > changes into a backup or into a restore.
>
> I then question why we are not adding encryption to pg_basebackup or
> pgbackrest rather than the database system.

Pgbackrest has encryption and authentication of it … but that doesn’t
actually address the attack vector that I outlined. If the backup user is
compromised then they can change the data before it gets to the storage.
If the backup user is compromised then they have access to whatever key is
used to encrypt and authenticate the backup and therefore can trivially
manipulate the data.

Encryption of backups by the backup tool serves to protect the data after
it leaves the backup system and is stored in cloud storage or in whatever
format the repository takes. This is beneficial, particularly when the
data itself offers no protection, but simply not the same.

> The beginning of this discussion also very clearly had individuals voicing
> > strong opinions that unauthenticated encryption methods were not
> acceptable as
> > an end-state for PG due to the clear issue of there then being no
> protection
> > against modification of data. The approach we are working towards
> provides
>
> What were the _technical_ reasons for those objections?

I believe largely the ones I’m bringing up here and which I outline above…
I don’t mean to pretend that any of this is of my own independent
construction. I don’t believe it is and my apologies if it came across that
way.

> both the unauthenticated option, which clearly has value to a large
> number of
> > our collective user base considering the number of commercial
> implementations
> > which have now arisen, and the authenticated solution which goes further
> and
> > provides the level clearly expected of the PG community. This gets us a
> win-win
> > situation.
> >
> > > There's clearly user demand for it as there's a number of
> organizations
> > > who have forks which are providing it in one shape or another.
> This
> > > kind of splintering of the community is actually an actively bad
> thing
> > > for the project and is part of what killed Unix, by at least some
> pretty
> > > reputable accounts, in my view.
> >
> > Yes, the number of commercial implementations of this is a concern.
> Of
> > course, it is also possible that those commercial implementations are
> > meeting checkbox requirements rather than technical ones, and the
> > community has been hostile to check box-only features.
> >
> >
> > I’ve grown weary of this argument as the other major piece of work it was
> > routinely applied to was RLS and yet that has certainly been seen
> broadly as a
> > beneficial feature with users clearly leveraging it and in more than some
> > “checkbox” way.
>
> RLS has to overcome that objection, and I think it did, as was better
> for doing that.

Beyond it being called a checkbox - what were the arguments against it? I
don’t object to being challenged to point out the use cases, but I feel
that at least some very clear and straight forward ones are outlined from
what has been said above. I also don’t believe those are the only ones but
I don’t think I could enumerate every use case for RLS either, even after
seeing it used for quite a few years. I do seriously question the level of
effort expected of features that are claimed to be “Checkbox” and tossed
almost exclusively for that reason on this list given the success of the
ones that have been accepted and are in active use by our users today.

> We, as a community, are clearly losing value by lack of this capability,
> if by
> > no other measure than simply the numerous users of the commercial
> > implementations feeling that they simply can’t use PG without this
> feature, for
> > whatever their reasoning.
>
> That is true, but I go back to my concern over useful feature vs. check
> box.

While it’s easy to label something as checkbox, I don’t feel we have been
fair to our users in doing so as it has historically prevented features
which our users are demanding and end up getting from commercial providers
until we implement them ultimately anyway. This particular argument simply
doesn’t seem to actually hold the value that proponents of it claim, for us
at least, and we have clear counter-examples which we can point to and I
hope we learn from those.

Thanks!

Stephen

In response to

Re: Moving forward with TDE at 2023-03-27 22:16:59 from Bruce Momjian

Responses

Re: Moving forward with TDE at 2023-03-27 23:19:21 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Sandro Santilli	2023-03-27 22:58:47	Re: [PATCH] Support % wildcard in extension upgrade filenames
Previous Message	Tomas Vondra	2023-03-27 22:43:34	Re: Memory leak from ExecutorState context?