index-only quals vs. security_barrier views

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: index-only quals vs. security_barrier views
Date: 2012-02-09 17:02:29
Message-ID: CA+Tgmob3PQa=FoO=s=Eb4H=vvewPvKnJiqyjFFm7hyb8Q7ajbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When Heikki worked up his original index-only scan patches (which
didn't end up looking much like what eventually got committed), he had
the notion of an index-only qual. That is, given a query like this:

select sum(1) from foo where substring(a,1,3) = 'abc';

We could evaluate the substring qual before performing the heap fetch,
and fetch the tuple from the heap only if the qual passes. The
current code is capable of generating an index-only scan plan for this
query, but the MVCC visibility check always happens first: if the page
is all-visible, we go ahead and evaluate the qual using only the index
data, but if the page is not all-visible, we do the heap fetch first
and then check the qual only if the tuple is visible to our snapshot.
It would be nice to have the ability to do those checks in the other
order, in the case where the projected cost of checking the qual is
less than the projected cost of the heap fetch. This would allow
index-only scans to win in more situations than they can right now,
because we'd conceivably avoid quite a bit of random I/O if the qual
is fairly selective and there are a decent number of visibility map
bits that are unset.

In fact, this technique might pay off even if we don't have a covering index:

select * from foo where substring(a,1,3) = 'abc';

If the expected selectivity of the qual is low and the index is a lot
smaller than the table, we might want to iterate through all the index
tuples in the entire index and fetch the heap tuples for only those
where the qual passes. This would allow index-only scans - or
whatever term you want to use, since there's not much that's index
"only" about this case - to potentially win even when no
visibility-map bits are set at all.

Now, there's a fly in the ointment here, which is that applying
arbitrary user-defined functions to tuples that might not be visible
doesn't sound very safe. The user-defined function in question might
perform some action based on those invisible tuples that has side
effects, which would be bad, because now we're violating MVCC
semantics. Or it might throw an error, killing the scan dead on the
basis of the contents of some tuple that the scan shouldn't even see.
However, there is certainly a class of functions for which this type
of optimization would be safe, and it's an awful lot like the set of
functions that can be safely pushed down through a security_barrier
view - namely, things that don't have side effects or throw errors.
So it's possible that the proleakproof flag KaiGai is proposing to add
to pg_proc could do double duty, serving also to identify when it's
safe to apply a qual to an index tuple when the corresponding heap
tuple might be invisible. However, I have some reservations about
assuming that the two concepts are exactly the same. For one thing,
people are inevitably going to want to cheat a little bit more here
than is appropriate for security views, and encouraging people to mark
things LEAKPROOF when they're really not is a security disaster
waiting to happen. For another thing, there are some important cases
that this doesn't cover, like:

select * from foo where substring(a,1,3) like '%def%';

The like operator doesn't seem to be leakproof in the security sense,
because it can throw an error if the pattern is something like a
single backslash (ERROR: LIKE pattern must not end with escape
character) and indeed it doesn't seem like it would be safe here
either if the pattern were stored in the table. But if the pattern
were constant, it'd be OK, or almost OK: there's still the edge case
where the table contains invisible rows but no visible ones - whether
or not we complain about the pattern there ought to be the same as
whether or not we complain about it on a completely empty table. If
we got to that point, then we might as well consider the qual
leakproof for security purposes under the same set of circumstances
we'd consider it OK to apply to possibly-invisible tuples.

I don't have any appetite for trying to do anything more with
index-only scans for 9.2, though maybe someone else will think
otherwise. But I would like very much to get KaiGai's leakproof stuff
committed, and so it seems like a good idea to reconcile the needs of
that machinery with what might eventually be needed here.

Comments?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-02-09 17:16:20 Re: Progress on fast path sorting, btree index creation time
Previous Message Kevin Grittner 2012-02-09 16:18:21 Notify system doesn't recover from "No space" error