Re: [v9.2] Fix Leaky View Problem

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Kohei(dot)Kaigai(at)emea(dot)nec(dot)com, thom(at)linux(dot)com, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: [v9.2] Fix Leaky View Problem
Date: 2011-10-16 08:46:04
Message-ID: CADyhKSV8Xmp2ZmQq7Hj6M_7tQ2436QsLYWRb=ReMkLSbjatqFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

I'm a bit confusing about this sentence.

> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together.  If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index.  Or at least a
> highly selective index.
>

I tried to reproduce the scenario with enough small from/join_collapse_limit
(typically 1), but it allows to push down qualifiers into the least scan plan.

E.g)
mytest=# SET from_collapse_limit = 1;
mytest=# SET join_collapse_limit = 1;
mytest=# CREATE VIEW B AS SELECT B1.* FROM B1,B2,B3 WHERE B1.x = B2.x
AND B2.x = B3.x;
mytest=# EXPLAIN SELECT * FROM A,B,C WHERE A.x=B.x AND B.x=C.x AND f_leak(B.y);
QUERY PLAN
------------------------------------------------------------------------------------
Merge Join (cost=381.80..9597.97 rows=586624 width=108)
Merge Cond: (a.x = b1.x)
-> Merge Join (cost=170.85..290.46 rows=7564 width=72)
Merge Cond: (a.x = c.x)
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: a.x
-> Seq Scan on a (cost=0.00..22.30 rows=1230 width=36)
-> Sort (cost=85.43..88.50 rows=1230 width=36)
Sort Key: c.x
-> Seq Scan on c (cost=0.00..22.30 rows=1230 width=36)
-> Materialize (cost=210.95..528.56 rows=15510 width=44)
-> Merge Join (cost=210.95..489.78 rows=15510 width=44)
Merge Cond: (b1.x = b3.x)
-> Merge Join (cost=125.52..165.40 rows=2522 width=40)
Merge Cond: (b1.x = b2.x)
-> Sort (cost=40.09..41.12 rows=410 width=36)
Sort Key: b1.x
-> Seq Scan on b1 (cost=0.00..22.30
rows=410 width=36)
Filter: f_leak(y)
-> Sort (cost=85.43..88.50 rows=1230 width=4)
Sort Key: b2.x
-> Seq Scan on b2 (cost=0.00..22.30
rows=1230 width=4)
-> Sort (cost=85.43..88.50 rows=1230 width=4)
Sort Key: b3.x
-> Seq Scan on b3 (cost=0.00..22.30 rows=1230 width=4)
(25 rows)

In this example, f_leak() takes an argument come from B1 table within B view,
and it was correctly distributed to SeqScan on B1.

From perspective of the code, the *_collapse_limit affects the contents of
joinlist being returned from deconstruct_jointree() whether its sub-portion is
flatten, or not.
However, the qualifiers are distributed on distribute_restrictinfo_to_rels() to
RelOptInfo based on its dependency of relations being referenced by
arguments. Thus, the above f_leak() was distributed to B1, not B, because
its arguments come from only B1.

I agree with the following approach to tackle this problem in 100%.
However, I'm unclear how from/join_collapse_limit affects to keep
sub-queries unflatten. It seems to me it is determined based on
the result of is_simple_subquery().

> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>

Thanks,

2011/10/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Mon, Oct 10, 2011 at 4:28 PM, Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> wrote:
>> I agreed. We have been on the standpoint that tries to prevent
>> leakable functions to reference a portion of join-tree being already
>> flatten, however, it has been a tough work.
>> It seems to me it is much simple approach that enables to push
>> down only non-leaky functions into inside of sub-queries.
>>
>> An idea is to add a hack on distribute_qual_to_rels() to relocate
>> a qualifier into inside of the sub-query, when it references only
>> a particular sub-query being come from a security view, and
>> when the sub-query satisfies is_simple_subquery(), for example.
>
> If you can make this work, I think it could be a pretty sweet plannner
> optimization even apart from the implications for security views.
> Consider a query of this form:
>
> A LEFT JOIN B LEFT JOIN C
>
> where B is a view defined as:
>
> B1 JOIN B2 JOIN B3 LEFT JOIN B4 LEFT JOIN B5
>
> Now let's suppose that from_collapse_limit/join_collapse_limit are set
> low enough that we decline to fold these subproblems together.  If
> there happens to be a qual B.x = 1, where B.x is really B1.x, then the
> generated plan sucks, because it will basically lose the ability to
> filter B1 early, very possibly on, say, a unique index.  Or at least a
> highly selective index.  If we could allow the B.x qual to trickle
> down inside of the subquery, we'd get a much better plan.  Of course,
> it's still not as good as flattening, because it won't allow us to
> consider as many possible join orders - but the whole point of having
> from_collapse_limit/join_collapse_limit in the first place is that we
> can't consider all the join orders without having planning time and
> memory usage balloon wildly out of control.  And in many real-world
> cases, I think that this would probably mitigate the effects of
> exceeding from_collapse_limit/join_collapse_limit quite a bit.
>
> In order to make it work, though, you'd need to arrange things so that
> we distribute quals to rels in the parent query, then let some of them
> filter down into the subquery, then distribute quals to rels in the
> subquery (possibly adjusting RTE indexes?), then finish planning the
> subquery, then finish planning the parent query.  Not sure how
> possible/straightforward that is.
>
> It's probably a good idea to deal with this part first, because if you
> can't make it work then the whole approach is in trouble.  I'm almost
> imagining that we could break this into three independent patches,
> like this:
>
> 1. Let quals percolate down into subqueries.
> 2. Add the notion of a security view, which prevents flattening and
> disables the optimization of patch #1
> 3. Add the notion of a leakproof function, which can benefit from the
> optimization of #1 even when the view involved is a security view as
> introduced in #2
>
> Unlike the way you have it now, I think those patches could be
> independently committable.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-10-16 15:33:37 Re: Pushing ScalarArrayOpExpr support into the btree index AM
Previous Message Chris Redekop 2011-10-16 01:33:33 Re: Hot Backup with rsync fails at pg_clog if under load