Re: Column Redaction

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Damian Wolgast <damian(dot)wolgast(at)si-co(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Column Redaction
Date: 2014-10-10 11:27:47
Message-ID: 20141010112747.GD28859@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> On 10/10/2014 02:05 PM, Stephen Frost wrote:
> >* Heikki Linnakangas (hlinnakangas(at)vmware(dot)com) wrote:
> >>On 10/10/2014 01:35 PM, Stephen Frost wrote:
> >>>Regarding functions, 'leakproof' functions should be alright to allow,
> >>>though Heikki brings up a good point regarding binary search being
> >>>possible in a plpgsql function (or even directly by a client). Of
> >>>course, that approach also requires that you have a specific item in
> >>>mind.
> >>
> >>It doesn't require that you have a specific item in mind. Binary
> >>search is cheap, O(log n). It's easy to write a function to do a
> >>binary search on a single item, passed as argument, and then apply
> >>that to all rows:
> >>
> >>SELECT binary_search_reveal(cardnumber) FROM redacted_table;
> >
> >Note that your binary_search_reveal wouldn't be marked as leakproof and
> >therefore this wouldn't be allowed. If this was allowed, you'd simply
> >do "raise notice" inside the function and call it a day.
>
> *shrug*, just do the same with a more complicated query, then. Even
> if you can't create a function that does that, you can still execute
> the same logic without a function.

Not sure I see what you're getting at here..? My point was that you'd
need a target number and the system would only provide confirmation that
the number exists, or does not. Your argument was that the table
itself would provide the target number, which was flawed. I don't see
how "just do the same with a more complicated query" removes the need to
have a target number for the binary search.

A better argument would be the equality case than the binary search if
you're simply looking for confirmation of existence. If the user can
define a table of targets, or uses a VALUES construct, and then join to
it then we might build a hash table and provide those results faster
than a binary search, though this again means that the user is
providing the list of keys to check.

As mentioned elsewhere on the thread, I agree that this capability
wouldn't be useful if a random search (which is providing the 'targets')
through a 10^16 keyspace generated a significant number of results (I'd
also throw in there "in a reasonable amount of time"- clearly it'd be
possible to extract all keys given sufficient time, even with a random
search). The sketch that Simon outlined won't obviously provide that
guarantee, but I'm not prepared to say we couldn't provide it at all
while meeting the goal he outlined.

Thanks,

Stephen

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-10-10 11:45:46 Re: Column Redaction
Previous Message Thom Brown 2014-10-10 11:25:59 Re: Column Redaction