Query running slower than same on Oracle

From: "Sailer, Denis (YBUSA-CDR)" <Denis(dot)Sailer(at)Yellowbook(dot)com>
To: "'pgsql-performance(at)postgresql(dot)org'" <pgsql-performance(at)postgresql(dot)org>
Subject: Query running slower than same on Oracle
Date: 2003-06-25 21:33:16
Message-ID: 58D6EF5727036048BC9E08ECB711A42092ECD0@crmsmail.corp.ybusa.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

We are evaluating PostgreSQL for a typical data warehouse application. I
have 3 tables below that are part of a Star schema design. The query listed
below runs in 16 seconds on Oracle 9.2 and 3+ minutes on PostgreSQL 7.3.3
Here are the details.

I'm wondering what else can be done to tune this type of query. Is 3
minutes reasonable given the amount of data that is loaded into the 3
tables? Is there anyone else who has made comparisons between Oracle and
PostgreSQL?

----------------------------------------------------------------------------
------------------------------------------------------------------------
Oracle 9.2 is running on a windows/2000 server, 600MHz PIII, 512MB ram
Shared Pool 48MB
Buffer Cache 98MB
Large Pool 8MB
Java Pool 32MB
=========
Total SGA 186MB

----------------------------------------------------------------------------
------------------------------------------------------------------------
PostgreSQL is running on Redhat Linux 7.2, 733MHz PIII processor, 383MB ram.
shared_buffers = 12384 (96 MB)
sort_mem = 16384

----------------------------------------------------------------------------
------------------------------------------------------------------------

explain analyze
select fiscalyearquarter, description, sum(amount_quantity)
from time t, revenue r, statistic s
Where t.fiscalyear = 2002
and r.timekey = t.timekey
and r.statisticskey = s.statisticskey
group by fiscalyearquarter, description;

QUERY
PLAN
----------------------------------------------------------------------------
------------------------------------------------------------------------
Aggregate (cost=124685.74..127078.87 rows=23931 width=48) (actual
time=170682.53..189640.85 rows=8 loops=1)
-> Group (cost=124685.74..126480.59 rows=239313 width=48) (actual
time=169508.49..185478.90 rows=1082454 loops=1)
-> Sort (cost=124685.74..125284.02 rows=239313 width=48) (actual
time=169508.47..171853.03 rows=1082454 loops=1)
Sort Key: t.fiscalyearquarter, s.description
-> Hash Join (cost=6.46..94784.90 rows=239313 width=48)
(actual time=140.20..47685.46 rows=1082454 loops=1)
Hash Cond: ("outer".statisticskey =
"inner".statisticskey)
-> Hash Join (cost=5.43..90595.90 rows=239313
width=32) (actual time=139.96..39672.76 rows=1082454 loops=1)
Hash Cond: ("outer".timekey = "inner".timekey)
-> Seq Scan on revenue r (cost=0.00..68454.04
rows=3829004 width=17) (actual time=0.01..26336.95 rows=3829004 loops=1)
-> Hash (cost=5.40..5.40 rows=12 width=15)
(actual time=0.79..0.79 rows=0 loops=1)
-> Seq Scan on "time" t (cost=0.00..5.40
rows=12 width=15) (actual time=0.36..0.75 rows=12 loops=1)
Filter: (fiscalyear = 2002::numeric)
-> Hash (cost=1.02..1.02 rows=2 width=16) (actual
time=0.04..0.04 rows=0 loops=1)
-> Seq Scan on statistic s (cost=0.00..1.02
rows=2 width=16) (actual time=0.02..0.03 rows=2 loops=1)
Total runtime: 195409.79 msec

This gives you an idea of the size of each table in the query
----------------------------------------------------------------------------
------------------------------------------------------------------------

pubnet=# vacuum analyze verbose revenue;
INFO: --Relation dw.revenue--
INFO: Pages 30164: Changed 0, Empty 0; Tup 3829004: Vac 0, Keep 0, UnUsed
17.
Total CPU 1.87s/0.73u sec elapsed 9.97 sec.
INFO: Analyzing dw.revenue
VACUUM
pubnet=# vacuum analyze verbose statistic;
INFO: --Relation dw.statistic--
INFO: Pages 1: Changed 0, Empty 0; Tup 2: Vac 0, Keep 0, UnUsed 1.
Total CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: Analyzing dw.statistic
VACUUM
pubnet=# vacuum analyze verbose time;
INFO: --Relation dw.time--
INFO: Pages 3: Changed 0, Empty 0; Tup 192: Vac 0, Keep 0, UnUsed 33.
Total CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: Analyzing dw.time
VACUUM
pubnet=#

I tried to disable the use of hash join to see what might happen. This
causes the optimizer to use a merge join. The timings are worse.

Here is the plan for that


QUERY PLAN

----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
Aggregate (cost=665570.44..667963.57 rows=23931 width=48) (actual
time=362121.97..381081.18 rows=8 loops=1)
-> Group (cost=665570.44..667365.29 rows=239313 width=48) (actual
time=360948.51..376904.14 rows=1082454 loops=1)
-> Sort (cost=665570.44..666168.72 rows=239313 width=48) (actual
time=360948.48..363285.85 rows=1082454 loops=1)
Sort Key: t.fiscalyearquarter, s.description
-> Merge Join (cost=631481.61..635669.60 rows=239313
width=48) (actual time=263257.77..276625.27 rows=1082454 loops=1)
Merge Cond: ("outer".statisticskey =
"inner".statisticskey)
-> Sort (cost=631480.58..632078.86 rows=239313
width=32) (actual time=260561.38..264151.04 rows=1082454 loops=1)
Sort Key: r.statisticskey
-> Merge Join (cost=587963.25..610099.74
rows=239313 width=32) (actual time=217380.88..231958.36 rows=1082454
loops=1)
Merge Cond: ("outer".timekey =
"inner".timekey)
-> Sort (cost=5.62..5.65 rows=12
width=15) (actual time=14.90..14.92 rows=12 loops=1)
Sort Key: t.timekey
-> Seq Scan on "time" t
(cost=0.00..5.40 rows=12 width=15) (actual time=13.47..14.83 rows=12
loops=1)
Filter: (fiscalyear =
2002::numeric)
-> Sort (cost=587957.63..597530.14
rows=3829004 width=17) (actual time=214776.92..224634.94 rows=1455997
loops=1)
Sort Key: r.timekey
-> Seq Scan on revenue r
(cost=0.00..68454.04 rows=3829004 width=17) (actual time=1.33..31014.95
rows=3829004 loops=1)
-> Sort (cost=1.03..1.03 rows=2 width=16) (actual
time=2696.35..3765.93 rows=541228 loops=1)
Sort Key: s.statisticskey
-> Seq Scan on statistic s (cost=0.00..1.02
rows=2 width=16) (actual time=19.50..19.52 rows=2 loops=1)
Total runtime: 385939.85 msec

The Query plan in Oracle looks like this...
----------------------------------------------------------------------------
----------------------------------------------------------------------------
--------
SORT GROUP BY
HASH JOIN
MERGE JOIN CARTESIAN
TABLE ACCESS FULL DBA_ADMIN
STATISTIC
BUFFER SORT
TABLE ACCESS FULL DBA_ADMIN
TIME
TABLE ACCESS FULL DBA_ADMIN
REVENUE

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-06-25 21:50:45 Re: Query running slower than same on Oracle
Previous Message Fernando Papa 2003-06-25 19:25:44 Similar querys, better execution time on worst execution plan