Re: [HACKERS] please help on query

From: Manfred Koizar <mkoi-pg(at)aon(dot)at>
To: "Luis Alberto Amigo Navarro" <lamigo(at)atc(dot)unican(dot)es>
Cc: <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [HACKERS] please help on query
Date: 2002-07-11 16:47:03
Message-ID: gu9riu8qp28ifp40qsva7b9puif47ihi5b@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

[moving to pgsql-sql]
On Thu, 11 Jul 2002 17:22:14 +0200, "Luis Alberto Amigo Navarro"
<lamigo(at)atc(dot)unican(dot)es> wrote:
>I can't improve performance on this query:
>
>SELECT
> supplier.name,
> supplier.address
>FROM
> supplier,
> nation
>WHERE
> supplier.suppkey IN(
> SELECT
> partsupp.suppkey
> FROM
> partsupp
> WHERE
> partsupp.partkey IN(
> SELECT
> part.partkey
> FROM
> part
> WHERE
> part.name like 'forest%'
> )
> AND partsupp.availqty>(
> SELECT
> 0.5*(sum(lineitem.quantity)::FLOAT)
> FROM
> lineitem
> WHERE
> lineitem.partkey=partsupp.partkey
> AND lineitem.suppkey=partsupp.partkey
^^^^^^^
suppkey ???
> AND lineitem.shipdate>=('1994-01-01')::DATE
> AND lineitem.shipdate<(('1994-01-01')::DATE+('1 year')::INTERVAL)::DATE
> )
> )
> AND supplier.nationkey=nation.nationkey
> AND nation.name='CANADA'
>ORDER BY
> supplier.name;

Luis,
rules of thumb: "Avoid subselects; use joins!" and "If you have to use
subselects, avoid IN, use EXISTS!"

Let's try. If partkey is unique in part, then
| FROM partsupp
| WHERE partsupp.partkey IN (SELECT part.partkey

can be replaced by
FROM partsupp ps, part p
WHERE ps.partkey = p.partkey

or
partsupp ps INNER JOIN part p
ON (ps.partkey = p.partkey AND p.name LIKE '...')

When we ignore "part" for now, your subselect boils down to

| SELECT partsupp.suppkey
| FROM partsupp
| WHERE partsupp.availqty > (
| SELECT 0.5*(sum(lineitem.quantity)::FLOAT)
| FROM lineitem
| WHERE lineitem.partkey=partsupp.partkey
| AND lineitem.suppkey=partsupp.suppkey
| AND lineitem.shipdate BETWEEN ... AND ...
| )

which can be rewritten to (untested)

SELECT ps.suppkey
FROM partsupp ps, lineitem li
WHERE li.partkey=ps.partkey
AND li.suppkey=ps.suppkey
AND lineitem.shipdate BETWEEN ... AND ...
GROUP BY ps.partkey, ps.suppkey
HAVING min(ps.availqty) > 0.5*(sum(lineitem.quantity)::FLOAT)
^^^
As all ps.availqty are equal in one group, you can as well
use max() or avg().

Now we have left only one IN:
| WHERE supplier.suppkey IN (
| SELECT partsupp.suppkey FROM partsupp WHERE <condition> )

Being to lazy to find out, if this can be rewritten to a join, let`s
apply rule 2 here:

WHERE EXISTS (
SELECT ... FROM partsupp ps
WHERE supplier.suppkey = ps.suppkey
AND <condition> )

HTH, but use with a grain of salt ...

>Sort (cost=2777810917708.17..2777810917708.17 rows=200 width=81)
^^^^^^^^^^^^^^^^
BTW, how many years are these? :-)

Servus
Manfred

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Lockhart 2002-07-11 17:38:17 Re: workaround for lack of REPLACE() function
Previous Message Bruce Momjian 2002-07-11 16:46:06 Re: Should this require CASCADE?

Browse pgsql-sql by date

  From Date Subject
Next Message Luis Alberto Amigo Navarro 2002-07-11 17:40:46 Re: [HACKERS] please help on query
Previous Message Stephan Szabo 2002-07-11 16:00:48 Re: how to inherits the references...