Re: optimizing constant quals within outer joins

From: Phil Frost <indigo(at)bitglue(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: optimizing constant quals within outer joins
Date: 2006-06-28 16:24:04
Message-ID: 20060628162404.GB23509@unununium.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 28, 2006 at 11:40:52AM -0400, Tom Lane wrote:
> Phil Frost <indigo(at)bitglue(dot)com> writes:
> > The planner in fact can move the function around without changing the
> > output.
>
> Not when it's within the nullable side of an outer join --- moving a
> WHERE clause up out of that would make the difference between no row
> out, and a null-extended row out, which are certainly not the same.
>
> I'm not sure why it's not pulling up from the left side of the left join
> though. That might be a bug. What PG version is this exactly?
>
> Of course the real question is why is your app generating such poorly
> phrased queries ;-)

Sure it can't pull the condition to the root result node, but it can
make an intermediate result node that is a child of the join and wraps
the sequential scan. "offset 0" makes it do this. I'd like this:

create table a(i int);
create table b(i int);
create function stable_function() returns bool language plpgsql stable as $$
begin return true; end $$;
create view c as select * from b where stable_function();
explain select * from a left join c using (i);
QUERY PLAN
-----------------------------------------------------------------
Merge Right Join (cost=220.32..338.32 rows=7629 width=4)
Merge Cond: ("outer".i = "inner".i)
-> Sort (cost=70.54..72.32 rows=713 width=4)
Sort Key: b.i
-> Seq Scan on b (cost=0.00..36.75 rows=713 width=4)
Filter: stable_function()
-> Sort (cost=149.78..155.13 rows=2140 width=4)
Sort Key: a.i
-> Seq Scan on a (cost=0.00..31.40 rows=2140 width=4)

to become this:

QUERY PLAN
-----------------------------------------------------------------
Merge Right Join (cost=220.32..338.32 rows=7629 width=4)
Merge Cond: ("outer".i = "inner".i)
-> Sort (cost=70.54..72.32 rows=713 width=4)
Sort Key: b.i
-> Result
One-Time Filter: stable_function()
-> Seq Scan on b (cost=0.00..36.75 rows=713 width=4)
Filter: stable_function()
-> Sort (cost=149.78..155.13 rows=2140 width=4)
Sort Key: a.i
-> Seq Scan on a (cost=0.00..31.40 rows=2140 width=4)

That will make the same results. Maybe there is something about the
implementation that I don't understand that makes it hard, but the
concept is simple: before you do a seq scan on b, you call
stable_function(), and if it returns true, you just do the sequential
scan without calling stable_function() for each row. If it returns
false, you can not do the sequental scan at all, and return the empty
set immediately.

I wasn't aware my queries are "badly phrased". The application generates
quite nice queries like "select * from saleorder_summary", which is a view
along the lines of 'select * from "order" left join saleorder using
(objectid)'. "order" and "saleorder" are views like "select * from
private.order where i_have_global_priv(20)". The subqueries are in the
examples I gave just to make it simpler to demonstrate.

The only other way I can think of phrasing a query like that is perhaps

select *
from private.order
left join purchaseorder on (
order.objectid = purchaseorder.objectid and i_have_global_priv(31)
)

This of course would not only be hugely inconvinent, but would require
that regular users have unrestricted access to the base tables, which
totally defeats the purpose of using veil. Also, that too is not
optimized as well as it could be:

test=# explain select * from a left join b on (a.i = b.i and stable_function());
QUERY PLAN
-----------------------------------------------------------------
Merge Left Join (cost=299.56..710.97 rows=7633 width=8)
Merge Cond: ("outer".i = "inner".i)
Join Filter: stable_function()
-> Sort (cost=149.78..155.13 rows=2140 width=4)
Sort Key: a.i
-> Seq Scan on a (cost=0.00..31.40 rows=2140 width=4)
-> Sort (cost=149.78..155.13 rows=2140 width=4)
Sort Key: b.i
-> Seq Scan on b (cost=0.00..31.40 rows=2140 width=4)

stable_function() will still be called multiple times needlessly.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Munro 2006-06-28 16:28:14 Index corruption
Previous Message Tom Lane 2006-06-28 16:09:51 Instability in TRUNCATE regression test