Re: leaky views, yet again

From: KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: leaky views, yet again
Date: 2010-10-06 00:29:59
Message-ID: 4CABC307.1070808@ak.jp.nec.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

(2010/10/06 4:06), Robert Haas wrote:
> On Tue, Oct 5, 2010 at 2:48 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> On 05.10.2010 21:08, Greg Stark wrote:
>>>> If the users that have select access on the view don't have DDL access
>>>> doesn't that make them leak-proof for those users?
>>
>>> No. You can use built-in functions for leaking data as well.
>>
>> There's a difference between "can be used to extract data wholesale"
>> and "can be used to probe for the existence of a specific value".
>> We might need to start using more specific terminology than "leak".
>
> Yeah. There are a lot of cases. The worst is if you can (1a) dump
> the underlying table wholesale, or maybe (1b) extract it one row at a
> time or something like that. Not quite as bad is if you can (2) infer
> the presence or absence of particular values in particular columns,
> e.g. via division-by-zero. This is still pretty bad though, because
> you can probably just keep guessing until you eventually can enumerate
> everything in that column. If it's a text field or a UUID that may be
> pretty hard, but if the range of interesting values for that column is
> limited to, say, a million or so, then you can just iterate through
> them until you find everything. A related problem is where you can
> (3) infer the frequency of a value based on the plan choice, either by
> viewing the EXPLAIN output directly or by timing attacks; and then
> there's (4) the ability to infer something about the overall amount of
> data in the underlying table. Any others?
>
> Of those, I'm inclined to think that it's possible to close off (1)
> and (2) pretty thoroughly with sufficient care, but the best you'd be
> able to do for (3) and (4) is refuse to EXPLAIN to a user without
> sufficient privileges; the timing attacks seem intractable.
>

Thanks for good summarize.

I also think the case (1) should be closed off soon, because it allows
to expose hidden data-contents without any inference of attacker; and
its throughput is unignorably fast, so its degree of threat is relatively
higher than other cases.

<side-note>
The idea of throughput is not my own idea. It come from the classic of
security evaluation criteria: Trusted Computer System Evaluation Criteria
(TCSEC, 1985)

See the page.80 of:
http://csrc.nist.gov/publications/history/dod85.pdf

| From a security perspective, covert channels with low bandwidths represent a
| lower threat than those with high bandwidths. However, for many types of
| covert channels, techniques used to reduce the bandwidth below a certain rate
| (which depends on the specific channel mechanism and the system architecture)
| also have the effect of degrading the performance provided to legitimate
| system users. Hence, a trade-off between system performance and covert
| channel bandwidth must be made.
</side-node>

I also think we should care about a part of (2) cases.
Could you separate the (2) into two cases.

The (2a) allows people to see hidden value using error message. In this case,
people can see direct value to be hidden, but thorough-put is not fast.
The (2b) allows people to infer existence or absence of a certain value using
PK or UNIQUE conflicts.

(2a) is the reason why my patch allows to push down only operators with
internal functions, because these are not obviously leakable.
However, I don't think (2b) is the case we should fix up here, because
no commercial RDBMSs with RLS don't handle such kind of side-channel
attacks using key conflicts.

And, it seems to me the cost will be too expensive to care about the
case (3) and (4). So, I think it is worthless to fix up them.

Thanks,
--
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Selena Deckelmann 2010-10-06 00:38:42 Submissions for a PostgreSQL track at MySQL Conf 2011: Due October 25
Previous Message Josh Berkus 2010-10-06 00:14:35 Re: Issues with Quorum Commit