Re: running totals

From: "Joel Burton" <jburton(at)scw(dot)org>
To: joel Burton <jburton(at)scw(dot)org>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: running totals
Date: 2000-12-05 21:01:37
Message-ID: 3A2D1161.24567.17DDAF6@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

From: <color><param>0000,0000,8000</param>Joel Burton <<jburton(at)scw(dot)org></color>

To: <color><param>0000,0000,8000</param>Mike Castle <<dalgoda(at)ix(dot)netcom(dot)com></color>

<bold>Subject: <color><param>0000,0000,8000</param>Re: [NOVICE] running totals</bold></color>

Date sent: <color><param>0000,0000,8000</param>Tue, 5 Dec 2000 16:01:05 -0500</color>

On 5 Dec 2000, at 11:54, Mike Castle wrote:

<color><param>7F00,0000,0000</param>>

> I'm seeking advise on best way to handle something.

>

> I have a series of containers of a variety of fixed sizes and I want
to

> put items into them, each of which may be different sizes.
(Actually

> it's the length of a video tape and how many hours of tv shows I
can put

> onto it, but it's a pretty general problem).

>

> The question is: is it better to keep a running total on the

> containers and update it every time an item is put into/removed
from the

> container or is it better to do a select every time and recalculate
the

> size of the items in the containers?

>

> Or is this one of those in the class: it depends?

>

> Currently I have a trigger plpgsql function that updates a tape by

> adding/subtracting the show just added/deleted from that tape.
Since

> intervals are integeral (sp?) types, I'm not worried about round off

> errors. If the value was a float, I'd suppose one would want to a

> select at least every so often to account for round off error.

>

> Also, right now I have the running total as part of the object. This

> means that when the object is not in use, that's an extra field that

> isn't being utilized. Not an issue in this case since the containers

> are almost all in use at any given time. But, when a field of a row
is

> updated, is the whole row read from the database, updated, then
written

> back out? If so, would it be that in some cases, it would be
better to

> break out fields that are updated often into their own table with a

> reference? Which is more efficient?

>

> Thanks,

> mrc (Going book shopping tonight for good Tcl/TK and general SQL

> books) --

> Mike Castle Life is like a clock: You can work

> constantly

> <underline><color><param>0000,8000,0000</param>dalgoda(at)ix(dot)netcom(dot)com</underline><color><param>7F00,0000,0000</param> and be right all the time, or not work
at all

> www.netcom.com/~dalgoda/ and be right at least twice a day. --
mrc

> We are all of us living in the shadow of Manhattan. --
Watchmen

</color>Like all good things, this can be an "it depends."

Generally speaking (!), it's a better idea not to store calculated

values in your database b/c if the calculation doesn't happen

(trigger was disabled during testing, etc.), your numbers would be

off, and you might never notice. In addition, you might want to

write a query that looks for *certain* totals (such as only action

films), so a total-all trigger might only be useful for some cases

anyway.

[ Mike -- I'm sure if you're creating triggers, you know basics about

VIEWs. But, since this is pgsql-novice, let me give other some
background.

Skip down to see more about your question]

If your database doesn't handle VIEWs, you're always stuck having

to do a complicated SELECT statement to find the most recent

count of shows on a tape. However, w/a database like PostgreSQL,

you can create a VIEW that shows the tape info, as well as total

shows.

For ex:

CREATE TABLE tape (

id INT NOT NULL PRIMARY KEY,

tape_name TEXT NOT NULL UNIQUE

);

-- assume that shows one-half-hour

-- is runtime '1', one-hour=runtime='2'

-- (this could be floats, or better still,

-- intervals, etc., but for simplicity:)

CREATE TABLE tv (

title TEXT PRIMARY KEY,

runtime INT NOT NULL

CHECK (runtime BETWEEN 1 AND 10),

tape_id INT NOT NULL REFERENCES tape

);

INSERT INTO tape VALUES (1, 'Simpsons');

INSERT INTO tape VALUES (2, 'Crime shows');

INSERT INTO tv VALUES ('Simpsons 3-eyed fish episode', 1, 1);

INSERT INTO tv VALUES ('Simpsons Lisa''s Crush on Nelson episode',

1, 1);

INSERT INTO tv VALUES ('Law & Order Episode 23', 1, 2);

You could always

SELECT id, tape_name, sum(runtime)

FROM tape t,

tv v

WHERE t.id = v.tape_id

GROUP BY id, tape_name;

But that's a pain to do every time! Instead,

CREATE VIEW tape_view AS

SELECT id, tape_name, sum(runtime)

FROM tape t,

show s

WHERE t.id = s.tape_id

GROUP BY id, tape_name;

Which means you can just

SELECT * FROM tape_view;

to see that same answer. Yes, it's more computationally intensive

than doing the trigger and having a regular SELECT, but, for most of
us

most of the time, the speed difference here is not a big deal at all.

[ Back to your question, Mike ]

I'd let your judge be

(a) how often do your insert/update data

(ie how often will your slow TRIGGER run?)

versus

(b) how often do you need to SELECt the

answer (ie how often does semi-slow VIEW run?)

considering

(c) how critical is it that the answer be absolutely

correct (in case trigger crashes, is disabled, etc.)

Of course, with a trigger, you could check that the total running

time doesn't exceed the tape running time, etc. Those things could

be done otherwise (by checking in the trigger, even if you weren't

storing the total), but since you're already in the trigger and already

know the total, there wouldn't be much speed loss.

Most of the time, I'd choose to calculate, unless speed for SELECTs

was really the primary issue.

--

If you do store the totals, make sure that the total field can't be

edited! That would be a critical mistake. The contrib/ directory of

PostgreSQL has an addon called (I think) noupdate, which can be

used to prevent updates to one column. Also, just be careful with

your trigger--if you DROP the table and reCREATE it, remember to

recreate your trigger, etc. It can be easy to forget these!

You might also want to run a SELECT query now-and-then to

compare your computed-as-you-go totals with a SUM() query, as

above, to make sure triggers are 100% working.

--

General advice about when to normalize, etc., can be found in a

good database theory book. I have two:

* SQL for Smarties (Celko). [terse and not always 100% clear, but

many helpful ideas about SQL and database design.]

* Introduction to Database Systems (Date). [hardly a true

beginners intro, but a very good book IMO on database theory.]

I'm not sure I'd say either book was perfect; others may have

better recommendations.

Usually for me, the choice of how to structure data is about 5%

space consideration, 15% speed, and about 70% minimizing

duplicates/data mismatches/ensure query ability, and 10% easiest-

to-do.

Good luck,

<nofill>
--
Joel Burton, Director of Information Systems -*- jburton(at)scw(dot)org
Support Center of Washington (www.scw.org)

Attachment Content-Type Size
unknown_filename text/enriched 6.8 KB

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message GH 2000-12-06 00:45:57 Re: inner join
Previous Message Joel Burton 2000-12-05 20:28:12 Re: inner join