Quick Links

Re: Optimization, etc

From:	Masaru Sugawara <rk73(at)echna(dot)ne(dot)jp>
To:	pgsql-sql(at)postgresql(dot)org
Subject:	Re: Optimization, etc
Date:	2001-11-17 13:30:09
Message-ID:	20011117165704.36EF.RK73@echna.ne.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

On Fri, 9 Nov 2001 07:57:41 -0800 (PST)
Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com> wrote:

> > On Fri, 9 Nov 2001, Jeff Sack wrote:
> >
> > One separate issue (the reason why the above examples are all about
> > batting statistics) I'm having is representing the innings pitched
> > statistic. The way it is often represented (and the way it is done in
> > this schema) is something like this 123.0 means exactly 123 innings
> > pitched, 123.1 means 123 1/3 innings, and 123.2 means 123 2/3 innings
> > pitched. I'm contemplating the best way to write a function that knows
> > how to sum these values accurately. Is this something that can be done
> > with PL/PGSQL or should I go straight to something like PLPERL?
> > Alternatively, I could research a way to represent fractions in the DB
> > and write a script to convert all values in this column. Any advice
> > here??
>
> You'd probably be best off doing the conversion at insert time into an
> additional field assuming that inserts are much less likely that
> selects on your data.
>
> (col-floor(col)*(10/3::numeric) seems to get back an appropriate value
> but is probably reasonably expensive.

Hi,

I tried to put your nice idea into the GROUP BY clause, since I wanted
to know how expensive it is. There are several players in the players
table and about 60k rows in the pitching_stats, which have only columns
concerned with the SUM() aggregate function. In case of my PC with 333MHz
clock, 256M SDRAM, and ATA33 HDD, the costs of executing the following
queries are about 4 sec and 6 sec, respectively. It seems to me that
they are reasonable. However the ratio of the sort time appears to occupy
quite a bit of time.

CREATE TABLE players (id serial
CONSTRAINT pkey_players
PRIMARY KEY,
first_name varchar(30));
CREATE TABLE pitching_stats(id integer
NOT NULL
CONSTRAINT key_pitching_stats
REFERENCES players(id),
ip numeric(5,1));

----- 1st version. It costs about 4 sec.
SELECT pl.id, pl.first_name, t.ret
FROM (SELECT ps.id, sum(ps.ip-floor(ps.ip)) % 0.3::numeric
+ floor((sum(ps.ip-floor(ps.ip))) / 0.3::numeric)
+ sum(floor(ps.ip)) AS ret
FROM pitching_stats AS ps
GROUP BY ps.id
) AS t INNER JOIN
players AS pl ON (t.id = pl.id)

----- 2nd version. It costs about 6 sec.
SELECT pl.id, pl.first_name,
sum(ps.ip-floor(ps.ip)) % 0.3::numeric
+ floor((sum(ps.ip-floor(ps.ip))) / 0.3::numeric)
+ sum(floor(ps.ip)) AS ret
FROM players AS pl INNER JOIN
pitching_stats AS ps ON (pl.id = ps.id)
GROUP BY pl.id, pl.first_name

QUERY PLAN:
----- 1st version.
Hash Join (cost=6357.01..7197.06 rows=6000 width=32)
-> Subquery Scan t (cost=6355.96..6955.96 rows=6000 width=16)
-> Aggregate (cost=6355.96..6955.96 rows=6000 width=16)
-> Group (cost=6355.96..6505.96 rows=60000 width=16)
-> Sort (cost=6355.96..6355.96 rows=60000 width=16)
-> Seq Scan on pitching_stats ps
(cost=0.00..983.00 rows=60000 width=16)
-> Hash (cost=1.04..1.04 rows=4 width=16)
-> Seq Scan on players pl (cost=0.00..1.04 rows=4 width=16)

----- 2nd version.
Aggregate (cost=9037.33..9787.33 rows=6000 width=32)
-> Group (cost=9037.33..9337.33 rows=60000 width=32)
-> Sort (cost=9037.33..9037.33 rows=60000 width=32)
-> Hash Join (cost=1.05..3384.10 rows=60000 width=32)
-> Seq Scan on pitching_stats ps
(cost=0.00..983.00 rows=6 0000 width=16)
-> Hash (cost=1.04..1.04 rows=4 width=16)
-> Seq Scan on players pl
(cost=0.00..1.04 rows=4 width=16)

regards,
Masaru Sugawara

In response to

Re: Optimization, etc at 2001-11-09 15:57:41 from Stephan Szabo

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Aasmund Midttun Godal	2001-11-17 19:09:59	Re: constraint via selection
Previous Message	Horst Herb	2001-11-17 13:00:57	Re: constraint via selection