Re: [HACKERS] please help on query

From: Jakub Ouhrabka <jakub(dot)ouhrabka(at)comgate(dot)cz>
To: Luis Alberto Amigo Navarro <lamigo(at)atc(dot)unican(dot)es>
Cc: Manfred Koizar <mkoi-pg(at)aon(dot)at>, <pgsql-sql(at)postgresql(dot)org>
Subject: Re: [HACKERS] please help on query
Date: 2002-07-12 11:50:29
Message-ID: Pine.LNX.4.44.0207121343260.8671-100000@server
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

hi,

avoid subselect: create a temp table and use join...

CREATE TEMP TABLE tmp AS
SELECT
lineitem.orderkey
FROM
lineitem
WHERE
lineitem.orderkey=orders.orderkey
GROUP BY
lineitem.orderkey HAVING
sum(lineitem.quantity)>300;

CREATE INDEX tmp_idx ON tmp (orderkey);

SELECT
customer.name,
customer.custkey,
orders.orderkey,
orders.orderdate,
orders.totalprice,
sum(lineitem.quantity)
FROM
customer,
orders,
lineitem,
tmp
WHERE
orders.orderkey=tmp.orderkey
AND customer.custkey=orders.custkey
AND orders.orderkey=lineitem.orderkey
GROUP BY
customer.name,
customer.custkey,
orders.orderkey,
orders.orderdate,
orders.totalprice
ORDER BY
orders.totalprice DESC,
orders.orderdate;

may be the index is not necessary...

kuba

> I'm going to keep on asking about another query:
>
> SELECT
> customer.name,
> customer.custkey,
> orders.orderkey,
> orders.orderdate,
> orders.totalprice,
> sum(lineitem.quantity)
> FROM
> customer,
> orders,
> lineitem
> WHERE
> exists(
> SELECT
> lineitem.orderkey
> FROM
> lineitem
> WHERE
> lineitem.orderkey=orders.orderkey
> GROUP BY
> lineitem.orderkey HAVING
> sum(lineitem.quantity)>300
> )
> AND customer.custkey=orders.custkey
> AND orders.orderkey=lineitem.orderkey
> GROUP BY
> customer.name,
> customer.custkey,
> orders.orderkey,
> orders.orderdate,
> orders.totalprice
>
> ORDER BY
> orders.totalprice DESC,
> orders.orderdate;
>
> NOTICE: QUERY PLAN:
>
> Sort (cost=26923941.97..26923941.97 rows=300061 width=66)
> -> Aggregate (cost=26851634.86..26896644.05 rows=300061 width=66)
> -> Group (cost=26851634.86..26889142.52 rows=3000612 width=66)
> -> Sort (cost=26851634.86..26851634.86 rows=3000612
> width=66)
> -> Hash Join (cost=26107574.81..26457309.10
> rows=3000612 width=66)
> -> Seq Scan on lineitem (cost=0.00..222208.25
> rows=6001225 width=8)
> -> Hash (cost=26105699.81..26105699.81
> rows=750000 width=58)
> -> Hash Join (cost=7431.00..26105699.81
> rows=750000 width=58)
> -> Seq Scan on orders
> (cost=0.00..26083268.81 rows=750000 width=25)
> SubPlan
> -> Aggregate
> (cost=0.00..17.35 rows=1 width=8)
> -> Group
> (cost=0.00..17.34 rows=5 width=8)
> -> Index Scan
> using lineitem_pkey on lineitem (cost=0.00..17.33 rows=5 width=8)
> -> Hash (cost=7056.00..7056.00
> rows=150000 width=33)
> -> Seq Scan on customer
> (cost=0.00..7056.00 rows=150000 width=33)
>
> again:
> orders 1500000 tuples
> lineitem 6000000 tuples there are 1 to 7 lineitems per orderkey
> Customer 150000 tuples
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='lineitem';
> attname | n_distinct | correlation
> ---------------+------------+-------------
> orderkey | -0.199847 | 1
> partkey | 196448 | 0.0223377
> suppkey | 9658 | -0.00822751
> linenumber | 7 | 0.17274
> quantity | 50 | 0.0150153
> extendedprice | 25651 | -0.00790245
> discount | 11 | 0.103761
> tax | 9 | 0.0993771
> returnflag | 3 | 0.391434
> linestatus | 2 | 0.509791
> shipdate | 2440 | 0.0072777
> commitdate | 2497 | 0.00698162
> receiptdate | 2416 | 0.00726686
> shipinstruct | 4 | 0.241511
> shipmode | 7 | 0.138432
> comment | 275488 | 0.0188006
> (16 rows)
>
> select attname,n_distinct,correlation from pg_stats where
> tablename='orders';
> attname | n_distinct | correlation
> ---------------+------------+-------------
> orderkey | -1 | -0.999925
> custkey | 76309 | 0.00590596
> orderstatus | 3 | 0.451991
> totalprice | -1 | -0.00768806
> orderdate | 2431 | -0.0211354
> orderpriority | 5 | 0.182489
> clerk | 1009 | 0.00546939
> shippriority | 1 | 1
> comment | -0.750125 | -0.0123887
>
> Customer
> attname | n_distinct | correlation
> ------------+------------+-------------
> custkey | -1 | 1
> name | -1 | 1
> address | -1 | -0.00510274
> nationkey | 25 | 0.0170533
> phone | -1 | -0.0227816
> acctbal | -0.83444 | -0.00220958
> mktsegment | 5 | 0.205013
> comment | -1 | 0.0327827
>
> This query takes 12 minutes to run and returns about 50 customers.
> lineitem.quantity takes values from 1 to 50, so 300 per orderkey is very
> restrictive
>
> May someone help on improving performance?
> Again thanks in advance
> Regards
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2002-07-12 12:45:24 Re: urgent needed
Previous Message Luis Alberto Amigo Navarro 2002-07-12 11:37:47 Re: [HACKERS] please help on query

Browse pgsql-sql by date

  From Date Subject
Next Message Jakub Ouhrabka 2002-07-12 11:52:37 Re: [HACKERS] please help on query
Previous Message Luis Alberto Amigo Navarro 2002-07-12 11:37:47 Re: [HACKERS] please help on query