Re: Tuning questions..

From: Bojan Belovic <bbelovic(at)usa(dot)net>
To: "Michael T(dot) Halligan" <michael(at)echo(dot)com>, Bojan Belovic <bbelovic(at)usa(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Tuning questions..
Date: 2001-12-19 19:04:32
Message-ID: 20011220000432.14537.qmail@umdvg008.cms.usa.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

This in interesting effect, that I ran into more than once, and I have a
question concerning this:
We see that the cost for this query went from roughly 12,000,000 to about
12,000. Of course, we cannot assume that the time of execution will be
directly proportional to this, and also, the weight factors assigned to disk
and CPU are relative, but - why such a "small" difference in the execution
time, when we lowered the cost by a factor of 1000?

As far as your question goes, I have run a similar query (that is a pair of
queries) on my system and I got a cost lowered from about 250M to 10k - pg
also used hash join (as your Oracle db). Could be indices - I have the field I
do a join on indexed on both tables (one primary other secondary) and the
field used in the WHERE clause indexed as a secondary (although pg doesn't use
it anyways, I suppose too many records with the same value). The only thing I
can think of is check to see if style_id on both tables is indexed.

"Michael T. Halligan" <michael(at)echo(dot)com> wrote:
> Thanks, that sped things up a bit, from 7.6 sec. to about 5.5 sec. However
> the plan still includes a sequential scan on ssa_candidate:
>
> Aggregate (cost=12161.11..12161.11 rows=1 width=35)
> -> Merge Join (cost=11611.57..12111.12 rows=19996 width=35)
> -> Sort (cost=11488.27..11488.27 rows=99805 width=24)
> -> Seq Scan on ssa_candidate sc (cost=0.00..3201.05
> rows=99805 width=24)
> -> Sort (cost=123.30..123.30 rows=31 width=11)
> -> Index Scan using station_subgenre_pk on station_subgenre
> ss (cost=0.00..122.53 rows=31 width=11)
>
>
> If we run the same query on Oracle (modified slightly for syntax):
>
> SELECT count(DISTINCT song_id) AS X
> FROM ssa_candidate SC,
> station_subgenre SS
> WHERE SC.style_id = SS.style_id
> AND SS.station_id =
>
> with the same data, it runs almost instaneously. Oracle's plan uses a hash
> join:
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=63 Card=1 Bytes=15)
> SORT (GROUP BY)
> HASH JOIN (Cost=63 Card=8943 Bytes=134145)
> INDEX (RANGE SCAN) OF STATION_SUBGENRE_PK (UNIQUE) (Cost=3 Card=12
> Bytes=84)
> TABLE ACCESS (FULL) OF SSA_CANDIDATE (Cost=59 Card=60364
> Bytes=482912)
>
> Is there some way to convince PostgreSQL to use a hash join?
>
> Bojan Belovic wrote:
>
> >Not sure about tuning, but it seems to me that this query would be
> much more
> >effective if it's rewritten like this (especially if style_id columns
> on both
> >tables are indexed):
> >
> >SELECT count(DISTINCT song_id) AS X
> >FROM ssa_candidate SC JOIN station_subgenre SS ON SC.style_id =
> SS.style_id
> >WHERE SS.station_id = 48
> >
> >Please correct me if I'm wrong. Also, Michael, if you give this one a
try,
> >could you send me the query plan, I'm just curious.
> >
> >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
> >WHERE
> >
> >SELECT count(DISTINCT song_id) FROM ssa_candidate, station_subgenre
> >WHERE "Michael T. Halligan" <michael(at)echo(dot)com> wrote:
> >
> >>Hi.. I seem to be running into a bottle neck on a query, and I'm not
> >>sure what the bottleneck is .
> >>The machine is a dual-processor p3 750 with 2 gigs of (pc100) memory,
> >>and 3 72 gig disks setup
> >>in raid 5. Right now i'm just testing our db for speed (we're porting
> >>from oracle) .. later on
> >>We're looking @ a quad xeon 700 with 16 gigs of ram & 10 drives in
> >>hardware raid 5.
> >>
> >>We've tuned the queries a bit, added some indices, and we got this query
> >>down from about 15 minutes
> >>to 7.6 seconds.. but it just seems like we should be able to get this
> >>query down to under a second on
> >>this box.. It's running the latest suse, with 2.4.16 kernel, reiserfs,
> >>postgres 7.2b3. I've tried many different combinations
> >>of buffers, stat collection space, sort space, etc. none of them really
> >>effect performance..
> >>
> >>When I run this particular query, the only resource that seems to change
> >>is one of the processors gets up to
> >>about 99% usage.. I've tried setting postgres to use up to 1.6 gigs of
> >>memory, but the postmaster never seems
> >>to get above about 700megs.. it's not swapping at all, though the
> >>contact switching seems to get a bit high (peaking
> >>at 150) ..
> >>
> >>The query sorts through about 80k rows.. here's the query
> >>--------------------------------------------------
> >>SELECT count(*) FROM (
> >> SELECT DISTINCT song_id FROM ssa_candidate WHERE
> >>style_id IN (
> >> SELECT style_id FROM station_subgenre WHERE
> >>station_id = 48
> >> )
> >> ) AS X;
> >>--------------------------------------------------
> >>and the query plan :
> >>--------------------------------------------------
> >>NOTICE: QUERY PLAN:
> >>
> >>Aggregate (cost=12236300.87..12236300.87 rows=1 width=13)
> >> -> Subquery Scan x (cost=12236163.64..12236288.40 rows=4990
width=13)
> >> -> Unique (cost=12236163.64..12236288.40 rows=4990
> width=13)SELECT count(DISTINCT song_id) FROM ssa_candidate,
station_subgenre
> >>WHERE
> >> -> Sort (cost=12236163.64..12236163.64 rows=49902
> width=13)
> >> -> Seq Scan on ssa_candidate
> >>(cost=0.00..12232269.54 rows=49902 width=13)
> >> SubPlan
> >> -> Materialize (cost=122.53..122.53
> >>rows=31 width=11)
> >> -> Index Scan using
> >>station_subgenre_pk on station_subgenre (cost=0.00..122.53 rows=31
> >>width=11)
> >>
> >>EXPLAIN
> >>--------------------------------------------------
> >>
> >>
> >>If anybody has any ideas, I'd be really appreciative..
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>---------------------------(end of broadcast)---------------------------
> >>TIP 2: you can get off all lists at once with the unregister command
> >> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
> >>
> >
> >
> >____________________________________________________________________
> >Get free e-mail and a permanent address at http://www.amexmail.com/?A=1
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 3: if posting/reading through Usenet, please send an appropriate
> >subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> >message can get through to the mailing list cleanly
> >
>
>
>
>

____________________________________________________________________
Get free e-mail and a permanent address at http://www.amexmail.com/?A=1

Browse pgsql-admin by date

  From Date Subject
Next Message Michael T. Halligan 2001-12-19 21:28:34 Tuning questions..
Previous Message Mitch Vincent 2001-12-19 18:31:25 Re: poor performance of loading data