Re: Transparent Data Encryption (TDE) and encrypted files

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Transparent Data Encryption (TDE) and encrypted files
Date: 2019-10-04 13:18:58
Message-ID: CA+TgmoZhbeYmRoAccJ1oCN03Jz2Uak18QN4afx4WD7g+j7SVcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 3, 2019 at 9:42 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > It doesn't seem like it would require
> > much work at all to construct an argument that a hacker might enjoy
> > having unfettered access to pg_clog even if no other part of the
> > database can be read.
>
> The question isn't about what hackers would like to have access to, it's
> about what would actually provide them with a channel to get information
> that's sensitive, and at what rate. Perhaps there's an argument to be
> made that clog would provide a high enough rate of information that
> could be used to glean sensitive information, but that's certainly not
> an argument that's been put forth, instead it's the knee-jerk reaction
> of "oh goodness, if anything isn't encrypted then hackers will be able
> to get access to everything" and that's just not a real argument.

Well, I gather that you didn't much like my characterization of your
argument as "making it up as you go along," which is probably fair,
but I doubt that the people who are arguing that we should encrypt
anything will appreciate your characterization of their argument as
"knee-jerk" any better.

I think everyone would agree that if you have no information about a
database other than the contents of pg_clog, that's not a meaningful
information leak. You would be able to tell which transactions
committed and which transactions aborted, but since you know nothing
about the data inside those transactions, it's of no use to you.
However, in that situation, you probably wouldn't be attacking the
database in the first place. Most likely you have some knowledge about
what it contains. Maybe there's a stream of sensor data that flows
into the database, and you can see that stream. By watching pg_clog,
you can see when a particular bit of data is rejected. That could be
valuable.

To take a highly artificial example, suppose that the database is fed
by secret video cameras which identify the faces of everyone who
boards a commercial aircraft and records all of those names in a
database, but high-ranking government officials are exempt from the
program and there's a white-list of people whose names can't be
inserted. When the system tries, a constraint violation occurs and the
transaction aborts. Now, if you see a transaction abort show up in
pg_clog, you know that either a high-ranking government official just
tried to walk onto a plane, or the system is broken. If you see a
whole bunch of aborts within a few hours of each other, separated by
lots of successful insertions, maybe you can infer a cabinet meeting.
I don't know. That's a little bit of a stretch, but I don't see any
reason why something like that can't happen. There are probably more
plausible examples.

The point is that it's unreasonable, at least in my view, to decide
that the knowledge of which transactions commit and which transactions
abort isn't sensitive. Yeah, on a lot of systems it won't be, but on
some systems it might be, so it should be encrypted.

What I really find puzzling here is that Cybertec had a patch that
encrypted -- well, I don't remember whether it encrypted this, but it
encrypted a lot of stuff, and it spent a lot of time being concerned
about these exact kinds of issues. I know for example that they
thought about the stats file, which is an even more clear vector for
information leakage than we're talking about here. They thought about
logical decoding spill files, also a clear vector for information
leakage. Pretty sure they also thought about WAL. That's all really
important stuff, and one thing I learned from reading that patch is
that you can't solve those problems in a trivial, mechanical way. Some
of those systems currently write data byte-by-byte, and converting
them to work block-by-block makes encrypting them a lot easier. So it
seems to me that even if you think that patch had the dumbest key
management system in the history of the universe, you ought to be
embracing some of the ideas that are in that patch because they'll
make any future encryption project easier. Instead of arguing about
whether these side-channel attacks are important -- and I seem not to
be alone here in believing that they are -- we could be working to get
code that has already been written to help solve those problems
committed.

I ask again -- why are you so opposed to a single-key,
encrypt-everything approach? Even if you think multiple-key,
encrypt-only-some-things is better, they don't have to block each
other.

> Which database systems have you looked at which have the properties
> you're describing above that we should be working hard towards?

I haven't studied other database systems much myself. I have, however,
talked with coworkers of mine who are trying to convince people to use
PostgreSQL and/or Advanced Server, and I've heard a lot from them
about what the customers with whom they work would like to see. I base
my comments on those conversations. What I hear from them is basically
that anything we could give them would help. More would be better than
less, of course. People would like a solution with key rotation better
than one without; fine-grained encryption better than coarse-grained
encryption; less performance overhead better than more; and an
encryption algorithm perceived as highly secure better than one
perceived as less secure. But having anything at all would help.

Secondarily, what I hear is that a lot of EnterpriseDB customers or
potential customers reject filesystem encryption not so much because
it's not sufficiently fine-grained, but rather because it depends on
root(at)localhost(dot) Getting root(at)localhost to cooperate is difficult and
undesirable, and also filesystem encryption doesn't help at all to
protect against root(at)localhost(dot) I've pointed out repeatedly to many
people that putting the encryption inside the database doesn't
*really* fix this problem, because root(at)localhost can ultimately do
anything. But, as I said in my earlier email, people perceive that if
the filesystem does the encryption, root can just cp all the files and
win, whereas if the database does the encryption, that doesn't work,
and root's got to work harder. That seems to matter to a lot of people
who are talking to my colleagues here at EnterpriseDB. That may, of
course, not matter to your users, and that's fine. I'm not trying to
block people from attacking this problem from other angles; but I *am*
frustrated that you seem to be trying to block what seems to me to be
the most promising angle.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-10-04 13:23:26 Re: Close stdout and stderr in syslogger
Previous Message Andrew Dunstan 2019-10-04 13:14:57 Re: fairywren failures