Re: Major differences between oracle and postgres performance - what can I do ?

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gary Cowell <gary_cowell(at)yahoo(dot)co(dot)uk>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Major differences between oracle and postgres performance - what can I do ?
Date: 2004-06-18 17:53:17
Message-ID: 20040618175317.GN11196@ns.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> >> [... thinks for awhile ...] It seems possible that they may use sort
> >> code that knows it is performing a DISTINCT operation and discards
> >> duplicates on sight. Given that there are only 534 distinct values,
> >> the sort would easily stay in memory if that were happening.
>
> > Could this optimization be added to PostgreSQL? It sounds like a very
> > reasonable thing to do.
>
> That's what I was wondering about too. But first I'd like to get
> some kind of reading on how effective it would be. If someone can
> demonstrate that Oracle can do sort-and-drop-dups a lot faster than
> it can do a straight sort of the same amount of input data, that
> would be a strong indication that it's worth doing. At this point
> we don't know if that's the source of their win or not.

Alright, I did a couple tests, these are different systems with
different hardware, but in the end I think the difference is clear:

tsf=# explain analyze select distinct access_type_id from p_gen_dom_dedicated_swc_access ;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=321591.00..333205.56 rows=16 width=10) (actual time=32891.141..37420.429 rows=16 loops=1)
-> Sort (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32891.137..35234.810 rows=2322912 loops=1)
Sort Key: access_type_id
-> Seq Scan on p_gen_dom_dedicated_swc_access (cost=0.00..55492.12 rows=2322912 width=10) (actual time=0.013..3743.470 rows=2322912 loops=1)
Total runtime: 37587.519 ms
(5 rows)

tsf=# explain analyze select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=321591.00..327398.28 rows=2322912 width=10) (actual time=32926.696..35278.847 rows=2322912 loops=1)
Sort Key: access_type_id
-> Seq Scan on p_gen_dom_dedicated_swc_access (cost=0.00..55492.12 rows=2322912 width=10) (actual time=0.014..3753.443 rows=2322912 loops=1)
Total runtime: 36737.628 ms
(4 rows)

So, about the same from postgres in each case. From Oracle:

(select access_type_id from p_gen_dom_dedicated_swc_access order by access_type_id)
sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql > /dev/null

real 3m55.12s
user 2m25.87s
sys 0m10.59s

(select distinct access_type_id from p_gen_dom_dedicated_swc_access)
sauron:/home/sfrost> time sqlplus mci_vendor/mci @test.sql > /dev/null

real 0m5.08s
user 0m3.86s
sys 0m0.95s

All the queries were run multiple times, though there wasn't all that
much difference in the times. Both systems are pretty speedy, but I
tend to feel the Postgres box is faster in CPU/disk access time, which
is probably why the Oracle system took 4 minutes to do what the Postgres
systems does in 40 seconds. My only other concern is the Oracle system
having to do the write I/O while the postgres one doesn't... I don't
see an obvious way to get around that though, and I'm not sure if it'd
really make *that* big of a difference.

Stephen

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2004-06-18 18:07:20 Re: [BULK] Problems with vacuum!
Previous Message Tom Lane 2004-06-18 17:01:51 Re: Major differences between oracle and postgres performance - what can I do ?