Re: Moving forward with TDE

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Chris Travers <chris(dot)travers(at)gmail(dot)com>, David Christensen <david+pg(at)pgguru(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Moving forward with TDE
Date: 2023-03-28 00:03:50
Message-ID: CAOuzzgppaX+VP5+AkAC7oKUk7DzvN-D-DwwUydkwCSSDDUaEfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

On Mon, Mar 27, 2023 at 19:19 Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> On Tue, Mar 28, 2023 at 12:57:42AM +0200, Stephen Frost wrote:
> > I consider the operating system and its processes as much more of a
> > single entity than TLS over a network.
> >
> > This may be the case sometimes but there’s absolutely no shortage of
> other
> > cases and it’s almost more the rule these days, that there is some kind
> of
> > network between the OS processes and the storage- a SAN, an iSCSI
> network, NFS,
> > are all quite common.
>
> Yes, but consider that the database cluster is having to get its data
> from that remote storage --- the remote storage is not an independent
> entity that can be corrupted without the databaes server being
> compromised. If everything in PGDATA was GCM-verified, it would be
> secure, but because some parts are not, I don't think it would be.

The remote storage is certainly an independent system. Multi-mount LUNs are
entirely possible in a SAN (and absolutely with NFS, or just the NFS server
itself is compromised..), so while the attacker may not have any access to
the database server itself, they may have access to these other systems,
and that’s not even considering in-transit attacks which are also
absolutely possible, especially with iSCSI or NFS.

I don’t understand what is being claimed that the remote storage is “not an
independent system” based on my understanding of, eg, NFS. With NFS, a
directory on the NFS server is exported and the client mounts that
directory as NFS locally, all over a network which may or may not be
secured against manipulation. A user on the NFS server with root access is
absolutely able to access and modify files on the NFS server trivially,
even if they have no access to the PG server. Would you explain what you
mean?

I do agree that the ideal case would be that we encrypt everything we can
(not everything can be for various reasons, but we don’t actually need to
either) in the PGDATA directory is encrypted and authenticated, just like
it would be ideal if everything was checksum’d and isn’t today. We are
progressing in that direction thanks to efforts such as reworking the other
subsystems to used shared buffers and a consistent page format, but just
like with checksums we do not need to have the perfect solution for us to
provide a lot of value here- and our users know that as the same is true of
the unauthenticated encryption approaches being offered by the commercial
solutions.

> > As specific examples, consider:
> > >
> > > An attack against the database system where the database server is
> shut
> > down,
> > > or a backup, and the encryption key isn’t available on the system.
> > >
> > > The backup system itself, not running as the PG user (an option
> supported
> > by PG
> > > and at least pgbackrest) being compromised, thus allowing for
> injection
> > of
> > > changes into a backup or into a restore.
> >
> > I then question why we are not adding encryption to pg_basebackup or
> > pgbackrest rather than the database system.
> >
> > Pgbackrest has encryption and authentication of it … but that doesn’t
> actually
> > address the attack vector that I outlined. If the backup user is
> compromised
> > then they can change the data before it gets to the storage. If the
> backup
> > user is compromised then they have access to whatever key is used to
> encrypt
> > and authenticate the backup and therefore can trivially manipulate the
> data.
>
> So the idea is that the backup user can be compromised without the data
> being vulnerable --- makes sense, though that use-case seems narrow.

That’s perhaps a fair consideration- but it’s clearly of enough value that
many of our users are asking for it and not using PG because we don’t have
it today. Ultimately though, this clearly makes it more than a “checkbox”
feature. I hope we are able to agree on that now.

> What were the _technical_ reasons for those objections?
> >
> > I believe largely the ones I’m bringing up here and which I outline
> above… I
> > don’t mean to pretend that any of this is of my own independent
> construction. I
> > don’t believe it is and my apologies if it came across that way.
>
> Yes, there is value beyond the check-box, but in most cases those
> values are limited considering the complexity of the features, and the
> check-box is what most people are asking for, I think.

For the users who ask on the lists for this feature, regularly, how many
don’t ask because they google or find prior responses on the list to the
question of if we have this capability? How do we know that their cases
are “checkbox”? Consider that there are standards groups which explicitly
consider these attack vectors and consider them important enough to require
mitigations to address those vectors. Do the end users of PG understand the
attack vectors or why they matter? Perhaps not, but just because they
can’t articulate the reasoning does NOT mean that the attack vector doesn’t
exist or that their environment is somehow immune to it- indeed, as the
standards bodies surely know, the opposite is true- they’re almost
certainly at risk of those attack vectors and therefore the standards
bodies are absolutely justified in requiring them to provide a solution.
Treating these users as unimportant because they don’t have the depth of
understanding that we do or that the standards body does is not helping
them- it’s actively driving them away from PG.

> > I’ve grown weary of this argument as the other major piece of work
> it was
> > > routinely applied to was RLS and yet that has certainly been seen
> broadly
> > as a
> > > beneficial feature with users clearly leveraging it and in more
> than some
> > > “checkbox” way.
> >
> > RLS has to overcome that objection, and I think it did, as was better
> > for doing that.
> >
> > Beyond it being called a checkbox - what were the arguments against it?
> I
>
> The RLS arguments were that queries could expoose some of the underlying
> data, but in summary, that was considered acceptable.

This is an excellent point- and dovetails very nicely into my argument that
protecting primary data (what is provided by users and ends up in indexes
and heaps) is valuable even if we don’t (yet..) have protection for other
parts of the system. Reducing the size of the attack vector is absolutely
useful, especially when it’s such a large amount of the data in the system.
Yes, we should, and will, continue to improve- as we do with many features,
but we don’t need to wait for perfection to include this feature, just as
with RLS and numerous other features we have.

> > We, as a community, are clearly losing value by lack of this
> capability,
> > if by
> > > no other measure than simply the numerous users of the commercial
> > > implementations feeling that they simply can’t use PG without this
> > feature, for
> > > whatever their reasoning.
> >
> > That is true, but I go back to my concern over useful feature vs.
> check
> > box.
> >
> > While it’s easy to label something as checkbox, I don’t feel we have
> been fair
>
> No, actually, it isn't. I am not sure why you are saying that.

I’m confused as to what is required to label a feature as a “checkbox”
feature then. What did you us to make that determination of this feature?
I’m happy to be wrong here.

> to our users in doing so as it has historically prevented features which
> our
> > users are demanding and end up getting from commercial providers until we
> > implement them ultimately anyway. This particular argument simply
> doesn’t seem
> > to actually hold the value that proponents of it claim, for us at least,
> and we
> > have clear counter-examples which we can point to and I hope we learn
> from
> > those.
>
> I don't think you are addressing actual issues above.

Specifics would be really helpful. I don’t doubt that there are things I’m
missing, but I’ve tried to address each point raised clearly and concisely.

Thanks!

Stephen

>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-03-28 00:13:10 Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
Previous Message Peter Geoghegan 2023-03-27 23:59:18 Re: Add pg_walinspect function with block info columns