Re: query decorrelation in postgres

From: mahendra chavan <mahcha(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: query decorrelation in postgres
Date: 2009-07-24 15:25:40
Message-ID: 5a0dbeb90907240825g662834ceg80dbb88f073e4c2c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for you response. I was looking for a query rewriting mechanism
which would be outside the optimizer and will do this kind of
transformations at the query level.

~Mahendra

On Fri, Jul 24, 2009 at 7:32 PM, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov
> wrote:

> mahendra chavan <mahcha(at)gmail(dot)com> wrote:
> > What I meant by de-correlation was optimizing a query to get rid of
> > sub-queirs by using joins.
> >
> > eg. In the TPC-H schema, a query to find out the names of suppliers
> > who supply parts having size < 100
> >
> > *Query with nested subqueries:*
> >
> > SELECT
> > S_NAME
> > FROM
> > SUPPLIER
> > WHERE
> > S_SUPPKEY
> > IN (
> > SELECT
> > PS_SUPPKEY
> > FROM
> > PARTSUPP
> > WHERE
> > PS_PARTKEY
> > IN (
> > SELECT
> > P_PARTKEY
> > FROM
> > PART
> > WHERE
> > P_SIZE < 100
> > )
> >
> >
> >
> > *Query with joins without subqueries:*
> >
> >
> > SELECT
> > S_NAME
> > FROM
> > SUPPLIER
> > INNER JOIN
> > PARTSUPP
> > ON
> > S_SUPPKEY = PS_SUPPKEY
> > INNER JOIN
> > PART
> > ON
> > P_PARTKEY = PS_PARTKEY
> > WHERE
> > P_SIZE < 100
> >
>
> Those two queries aren't exactly identical, because you could get
> duplicate rows in the second which would not be there in the first.
> Optimizations to "pull up" subqueries into a higher level FROM clause
> as joins have been in PostgreSQL for as long as I've been using it,
> but the ability to do the specific optimization you show there
> (without the duplicates) was added in version 8.4 using "semi-joins".
> I don't think any syntax was added to explicitly write a query using
> semi-joins, though; it's an optimization of EXISTS, IN, ANY, etc.
>
> To see how the planner chooses to execute a given query against a
> particular schema which has a particular set of statistics about the
> data distributions, use the EXPLAIN option.
>
> http://www.postgresql.org/docs/8.4/interactive/sql-explain.html
>
> -Kevin
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-07-24 15:38:06 Re: bytea vs. pg_dump
Previous Message Tom Lane 2009-07-24 15:15:00 Re: display previous query string of idle-in-transaction