Skip site navigation (1) Skip section navigation (2)

Query running slower than same on Oracle

From: "Sailer, Denis (YBUSA-CDR)" <Denis(dot)Sailer(at)Yellowbook(dot)com>
To: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Query running slower than same on Oracle
Date: 2003-06-25 21:33:16
Message-ID: 58D6EF5727036048BC9E08ECB711A42092ECD0@crmsmail.corp.ybusa.net (view raw or flat)
Thread:
Lists: pgsql-performance
We are evaluating PostgreSQL for a typical data warehouse application.  I
have 3 tables below that are part of a Star schema design.  The query listed
below runs in 16 seconds on Oracle 9.2 and 3+ minutes on PostgreSQL 7.3.3
Here are the details.

I'm wondering what else can be done to tune this type of query.  Is 3
minutes reasonable given the amount of data that is loaded into the 3
tables?  Is there anyone else who has made comparisons between Oracle and
PostgreSQL?

----------------------------------------------------------------------------
------------------------------------------------------------------------
Oracle 9.2 is running on a windows/2000 server, 600MHz PIII, 512MB ram
	Shared Pool	48MB
	Buffer Cache	98MB
	Large Pool	  8MB
	Java Pool	32MB
			=========
	Total SGA	186MB

----------------------------------------------------------------------------
------------------------------------------------------------------------
PostgreSQL is running on Redhat Linux 7.2, 733MHz PIII processor, 383MB ram.
	shared_buffers = 12384		(96 MB)
	sort_mem = 16384

----------------------------------------------------------------------------
------------------------------------------------------------------------

explain analyze
select fiscalyearquarter, description, sum(amount_quantity)
from time t, revenue r, statistic s
Where t.fiscalyear = 2002
and r.timekey = t.timekey
and r.statisticskey = s.statisticskey
group by  fiscalyearquarter, description;

                                                                     QUERY
PLAN                                                                     
----------------------------------------------------------------------------
------------------------------------------------------------------------
 Aggregate  (cost=124685.74..127078.87 rows=23931 width=48) (actual
time=170682.53..189640.85 rows=8 loops=1)
   ->  Group  (cost=124685.74..126480.59 rows=239313 width=48) (actual
time=169508.49..185478.90 rows=1082454 loops=1)
         ->  Sort  (cost=124685.74..125284.02 rows=239313 width=48) (actual
time=169508.47..171853.03 rows=1082454 loops=1)
               Sort Key: t.fiscalyearquarter, s.description
               ->  Hash Join  (cost=6.46..94784.90 rows=239313 width=48)
(actual time=140.20..47685.46 rows=1082454 loops=1)
                     Hash Cond: ("outer".statisticskey =
"inner".statisticskey)
                     ->  Hash Join  (cost=5.43..90595.90 rows=239313
width=32) (actual time=139.96..39672.76 rows=1082454 loops=1)
                           Hash Cond: ("outer".timekey = "inner".timekey)
                           ->  Seq Scan on revenue r  (cost=0.00..68454.04
rows=3829004 width=17) (actual time=0.01..26336.95 rows=3829004 loops=1)
                           ->  Hash  (cost=5.40..5.40 rows=12 width=15)
(actual time=0.79..0.79 rows=0 loops=1)
                                 ->  Seq Scan on "time" t  (cost=0.00..5.40
rows=12 width=15) (actual time=0.36..0.75 rows=12 loops=1)
                                       Filter: (fiscalyear = 2002::numeric)
                     ->  Hash  (cost=1.02..1.02 rows=2 width=16) (actual
time=0.04..0.04 rows=0 loops=1)
                           ->  Seq Scan on statistic s  (cost=0.00..1.02
rows=2 width=16) (actual time=0.02..0.03 rows=2 loops=1)
 Total runtime: 195409.79 msec


This gives you an idea of the size of each table in the query
----------------------------------------------------------------------------
------------------------------------------------------------------------

pubnet=# vacuum analyze verbose revenue;
INFO:  --Relation dw.revenue--
INFO:  Pages 30164: Changed 0, Empty 0; Tup 3829004: Vac 0, Keep 0, UnUsed
17.
        Total CPU 1.87s/0.73u sec elapsed 9.97 sec.
INFO:  Analyzing dw.revenue
VACUUM
pubnet=# vacuum analyze verbose statistic;
INFO:  --Relation dw.statistic--
INFO:  Pages 1: Changed 0, Empty 0; Tup 2: Vac 0, Keep 0, UnUsed 1.
        Total CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  Analyzing dw.statistic
VACUUM
pubnet=# vacuum analyze verbose time;
INFO:  --Relation dw.time--
INFO:  Pages 3: Changed 0, Empty 0; Tup 192: Vac 0, Keep 0, UnUsed 33.
        Total CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO:  Analyzing dw.time
VACUUM
pubnet=# 


I tried to disable the use of hash join to see what might happen.  This
causes the optimizer to use a merge join.  The timings are worse.

Here is the plan for that

 
QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
 Aggregate  (cost=665570.44..667963.57 rows=23931 width=48) (actual
time=362121.97..381081.18 rows=8 loops=1)
   ->  Group  (cost=665570.44..667365.29 rows=239313 width=48) (actual
time=360948.51..376904.14 rows=1082454 loops=1)
         ->  Sort  (cost=665570.44..666168.72 rows=239313 width=48) (actual
time=360948.48..363285.85 rows=1082454 loops=1)
               Sort Key: t.fiscalyearquarter, s.description
               ->  Merge Join  (cost=631481.61..635669.60 rows=239313
width=48) (actual time=263257.77..276625.27 rows=1082454 loops=1)
                     Merge Cond: ("outer".statisticskey =
"inner".statisticskey)
                     ->  Sort  (cost=631480.58..632078.86 rows=239313
width=32) (actual time=260561.38..264151.04 rows=1082454 loops=1)
                           Sort Key: r.statisticskey
                           ->  Merge Join  (cost=587963.25..610099.74
rows=239313 width=32) (actual time=217380.88..231958.36 rows=1082454
loops=1)
                                 Merge Cond: ("outer".timekey =
"inner".timekey)
                                 ->  Sort  (cost=5.62..5.65 rows=12
width=15) (actual time=14.90..14.92 rows=12 loops=1)
                                       Sort Key: t.timekey
                                       ->  Seq Scan on "time" t
(cost=0.00..5.40 rows=12 width=15) (actual time=13.47..14.83 rows=12
loops=1)
                                             Filter: (fiscalyear =
2002::numeric)
                                 ->  Sort  (cost=587957.63..597530.14
rows=3829004 width=17) (actual time=214776.92..224634.94 rows=1455997
loops=1)
                                       Sort Key: r.timekey
                                       ->  Seq Scan on revenue r
(cost=0.00..68454.04 rows=3829004 width=17) (actual time=1.33..31014.95
rows=3829004 loops=1)
                     ->  Sort  (cost=1.03..1.03 rows=2 width=16) (actual
time=2696.35..3765.93 rows=541228 loops=1)
                           Sort Key: s.statisticskey
                           ->  Seq Scan on statistic s  (cost=0.00..1.02
rows=2 width=16) (actual time=19.50..19.52 rows=2 loops=1)
 Total runtime: 385939.85 msec


The Query plan in Oracle looks like this...
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
SORT GROUP BY
	HASH JOIN
		MERGE JOIN CARTESIAN
			TABLE ACCESS FULL			DBA_ADMIN
STATISTIC
			BUFFER SORT
				TABLE ACCESS FULL		DBA_ADMIN
TIME
		TABLE ACCESS FULL				DBA_ADMIN
REVENUE


Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2003-06-25 21:50:45
Subject: Re: Query running slower than same on Oracle
Previous:From: Fernando PapaDate: 2003-06-25 19:25:44
Subject: Similar querys, better execution time on worst execution plan

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group