Re: index-only quals vs. security_barrier views

From: Noah Misch <noah(at)leadboat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: index-only quals vs. security_barrier views
Date: 2012-03-02 20:17:44
Message-ID: 20120302201744.GA29063@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 09, 2012 at 12:02:29PM -0500, Robert Haas wrote:
> When Heikki worked up his original index-only scan patches (which
> didn't end up looking much like what eventually got committed), he had
> the notion of an index-only qual. That is, given a query like this:
>
> select sum(1) from foo where substring(a,1,3) = 'abc';
>
> We could evaluate the substring qual before performing the heap fetch,
> and fetch the tuple from the heap only if the qual passes.

> Now, there's a fly in the ointment here, which is that applying
> arbitrary user-defined functions to tuples that might not be visible
> doesn't sound very safe. The user-defined function in question might
> perform some action based on those invisible tuples that has side
> effects, which would be bad, because now we're violating MVCC
> semantics. Or it might throw an error, killing the scan dead on the
> basis of the contents of some tuple that the scan shouldn't even see.
> However, there is certainly a class of functions for which this type
> of optimization would be safe, and it's an awful lot like the set of
> functions that can be safely pushed down through a security_barrier
> view - namely, things that don't have side effects or throw errors.
> So it's possible that the proleakproof flag KaiGai is proposing to add
> to pg_proc could do double duty, serving also to identify when it's
> safe to apply a qual to an index tuple when the corresponding heap
> tuple might be invisible. However, I have some reservations about
> assuming that the two concepts are exactly the same. For one thing,
> people are inevitably going to want to cheat a little bit more here
> than is appropriate for security views, and encouraging people to mark
> things LEAKPROOF when they're really not is a security disaster
> waiting to happen.

The similarity is indeed tempting, but I find the concepts sufficiently
distinct to not make one device serve both. Adding to the reservations you
offer, LEAKPROOF is superuser-only. This other marker would not entail any
special privilege.

> For another thing, there are some important cases
> that this doesn't cover, like:
>
> select * from foo where substring(a,1,3) like '%def%';
>
> The like operator doesn't seem to be leakproof in the security sense,
> because it can throw an error if the pattern is something like a
> single backslash (ERROR: LIKE pattern must not end with escape
> character) and indeed it doesn't seem like it would be safe here
> either if the pattern were stored in the table. But if the pattern
> were constant, it'd be OK, or almost OK: there's still the edge case
> where the table contains invisible rows but no visible ones - whether
> or not we complain about the pattern there ought to be the same as
> whether or not we complain about it on a completely empty table. If
> we got to that point, then we might as well consider the qual
> leakproof for security purposes under the same set of circumstances
> we'd consider it OK to apply to possibly-invisible tuples.

This sort of thing implicates substring(), too, when you call it as
substring(a, 1, b); b < 0 produces an error.

To handle these, I think we'd need a facility along the lines of protransform.
Have a function inspecting call nodes for a particular other function and
determining whether each is ok-for-index-only-quals. You could even force
protransform itself into that role. Create an additional pg_proc entry
identical to the ordinary substring() but for a different name and having the
ok-for-index-only-quals flag. Add a protransform to the main pg_proc entry
that inspects the argument nodes and, when they're safe, replaces the call
with a call to that errorfree_substring() at plan time.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-03-02 20:36:41 Re: Collect frequency statistics for arrays
Previous Message Tom Lane 2012-03-02 20:13:02 Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)