Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project

From: Arun Chaitanya <chaitan64arun(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Date: 2012-03-30 14:12:22
Message-ID: CAFDHfCSmuwQ3m3AYFsvR3jfD4=f5fY3KGYJ7CyePAxS1Y67CrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot Heikki.
I have already posted an example in the mail.
The link to the paper is
http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf

Hope this helps,
Arun

On Fri, Mar 30, 2012 at 5:32 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> (off-list)
>
> You'll want to post a link to the paper, otherwise people who are not GSoC
> mentors will have no idea what you're talking about ;-). Posting an example
> query and access plans would illustrate the point, too.
>
>
> On 30.03.2012 14:33, Arun Chaitanya wrote:
>>
>> Hi,
>>
>> I wanted to take up this as a GSOC 2012 project.
>>
>> SQL supports nested queries. When the inner query contains a
>> correlation variable the present optimizer takes an iterative
>> execution plan. If the inner query scans over a relation, the
>> iterative plan chosen can be sub-optimal.
>>
>> The goal of this project is to enable De-correlation for all possible
>> cases. The iterative execution plan can be converted to a set oriented
>> plan by taking a join over the base relations involved. Sufficient
>> work has already been done in this area and has been implemented in
>> SQL Server.
>>
>> The changes required to incorporate the above mentioned strategy is in
>> rewriting phase to the best of my knowledge. The key idea is to
>> introduce the APPLY operator in the raw parse tree. In the above
>> mentioned Papers, the author has mentioned the process of removing the
>> apply. The author has proposed a set of rules which will allow us to
>> achieve the goal. The present postgresql optimizer community has done
>> some work in these lines for simple subqueries involving =,>  ,<  in
>> the predicates [ I observed it by seeing the result of EXPLAIN for
>> relevant queries ]. The optimization is not done for subqueries
>> containing aggregate queries and existential and containment queries.
>>
>> An example query from TPCH benchmark discussed by the author:
>>
>> select c_custkey
>> from customer
>> where 1000000<  (select sum(o_totalprice)
>> from orders
>> where o_custkey = c_custkey)
>>
>> In the above case, c_custkey is a correlation variable (parameterized)
>> coming from the outer query. Hence in the present system, the inner
>> query is executed as many times as the tuples in the customer
>> relation. As the subQuery involves a scan over orders relation, the
>> total I/O cost involved is pretty high.
>>
>> Using the transformation proposed by the author, the query can be
>> re-written as
>>
>> select c_custkey
>> from customer left outer join
>> orders on o_custkey = c custkey
>> group by c_custkey
>> having 1000000<  sum(o_totalprice)
>>
>> This allows the optimizer to chose a better plan for left outer join
>> and avoid iterative execution. The idea here is not to get a rewritten
>> query as output but to generate a plan tree that does the same as the
>> above query.
>>
>> Regards,
>> Arun
>>
>
>
> --
>  Heikki Linnakangas
>  EnterpriseDB   http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Arun Chaitanya 2012-03-30 14:18:54 Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
Previous Message Robert Haas 2012-03-30 14:11:00 Re: Command Triggers patch v18