Thanks a lot Heikki.
I have already posted an example in the mail.
The link to the paper is
Hope this helps,
On Fri, Mar 30, 2012 at 5:32 PM, Heikki Linnakangas
> You'll want to post a link to the paper, otherwise people who are not GSoC
> mentors will have no idea what you're talking about ;-). Posting an example
> query and access plans would illustrate the point, too.
> On 30.03.2012 14:33, Arun Chaitanya wrote:
>> I wanted to take up this as a GSOC 2012 project.
>> SQL supports nested queries. When the inner query contains a
>> correlation variable the present optimizer takes an iterative
>> execution plan. If the inner query scans over a relation, the
>> iterative plan chosen can be sub-optimal.
>> The goal of this project is to enable De-correlation for all possible
>> cases. The iterative execution plan can be converted to a set oriented
>> plan by taking a join over the base relations involved. Sufficient
>> work has already been done in this area and has been implemented in
>> SQL Server.
>> The changes required to incorporate the above mentioned strategy is in
>> rewriting phase to the best of my knowledge. The key idea is to
>> introduce the APPLY operator in the raw parse tree. In the above
>> mentioned Papers, the author has mentioned the process of removing the
>> apply. The author has proposed a set of rules which will allow us to
>> achieve the goal. The present postgresql optimizer community has done
>> some work in these lines for simple subqueries involving =,> ,< in
>> the predicates [ I observed it by seeing the result of EXPLAIN for
>> relevant queries ]. The optimization is not done for subqueries
>> containing aggregate queries and existential and containment queries.
>> An example query from TPCH benchmark discussed by the author:
>> select c_custkey
>> from customer
>> where 1000000< (select sum(o_totalprice)
>> from orders
>> where o_custkey = c_custkey)
>> In the above case, c_custkey is a correlation variable (parameterized)
>> coming from the outer query. Hence in the present system, the inner
>> query is executed as many times as the tuples in the customer
>> relation. As the subQuery involves a scan over orders relation, the
>> total I/O cost involved is pretty high.
>> Using the transformation proposed by the author, the query can be
>> re-written as
>> select c_custkey
>> from customer left outer join
>> orders on o_custkey = c custkey
>> group by c_custkey
>> having 1000000< sum(o_totalprice)
>> This allows the optimizer to chose a better plan for left outer join
>> and avoid iterative execution. The idea here is not to get a rewritten
>> query as output but to generate a plan tree that does the same as the
>> above query.
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
In response to
pgsql-hackers by date
|Next:||From: Arun Chaitanya||Date: 2012-03-30 14:18:54|
|Subject: Re: Optimizing Nested Correlated Queries by decorrelation:
GSOC 2012 Project|
|Previous:||From: Robert Haas||Date: 2012-03-30 14:11:00|
|Subject: Re: Command Triggers patch v18|