Re: Column Redaction

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Redaction
Date: 2014-10-10 14:53:55
Message-ID: 20141010145355.GF28859@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> You said above that it's OK to pass the card numbers to leakproof
> functions. But if you allow that, you can write a function that
> takes as argument a redacted card number, and unredacts it (using
> the < and = operators in a binary search). And then you can just do
> "SELECT unredact(card_number) from redacted_table".

Not sure I'm following what you mean by 'redacted'. The original
proposal provided '**** **** **** 1234' as the 'redacted' number, and
I'm not seeing how you can get the rest of the number trivially with
just equality and binary search.

If you start with a complete number then you can get the system to tell
you if it exists or not with a binary search or even just doing an
equality check.

> You seem to have something stronger in mind: only allow the equality
> operator on the redacted column, and nothing else.

That wasn't my suggestion- I was merely pointing out that if you have a
complete number (perhaps by pulling out a random number, with a filter
against the last four digits, reducing the search space to 10^12) which
you want to check for existance, you can do that directly. No need for
a binary search at all.

> That might be
> better, although I'm not really convinced. There are just too many
> ways you could still leak the datum. Just a random example, inspired
> by the recent CRIME attack on SSL: build a row with the redacted
> datum, and another "guess" datum, and store it along with 1k of
> other data in a temporary table. The row gets toasted. Observe how
> much it compressed; if the guess datum is close to the original
> datum, it compresses well. Now, you can probably stop that
> particular attack with more restrictions on what you can do with the
> datum, but that just shows that pretty much any computation you
> allow with the datum can be used to reveal its value.

One concept I've been thinking about is a notion of 'trusted' data
sources to allow comparison against. Perhaps individual values are
allowed from the user also, but my thought is that you have:

master_table
trusted_table

Such that you can't view the sensetive column in either the master or
the trusted table, but you can join between the two on the sensetive
column and view other, non-sensetive, attributes of the two tables. You
might even allow other transformations on the sensetive column, provided
it always results in a boolean comparison to another sensetive column.
Not sure if that really solves Simon's use-case exactly, but it might
tease out other thoughts.

Thanks!

Stephen

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-10 14:56:16 Re: Column Redaction
Previous Message Andres Freund 2014-10-10 14:41:39 Re: Wait free LW_SHARED acquisition - v0.9