Skip site navigation (1) Skip section navigation (2)

Generating custom statistics rows puzzler...left join on parts ofintervals?

From: Ken Corey <ken(dot)corey(at)atomic-interactive(dot)com>
To: Postgresql Novice List <pgsql-novice(at)postgresql(dot)org>
Subject: Generating custom statistics rows puzzler...left join on parts ofintervals?
Date: 2002-07-22 14:04:34
Message-ID: 1027346704.10808.4807.camel@kenlinux.bithub.org (view raw or flat)
Thread:
Lists: pgsql-novice
Hi All,

Got one of those tricky SQL questions that someone else might be able to
come up with a really slick answer.  I'll be as brief as I can be, but
it's a bit long.  The actual questions are at the bottom... RTFM would
be lovely if pointers are given.  I've searched the archives and docs
and come up blank...

My company provides games and interactive content to internet publishers
on an ASP basis.  This means we've got the raw data to produce
interesting usage statistics.

One of the things that would be quite interesting would be to take, on
an hourly basis, various of certain tables both total and hourly, and
then graph these. I've got an acceptable little java applet that graphcs
things in a quite easy way.  It's not terrifically powerful, but it's
not big, either.

A little background: when a customer puts our game on their site, it's
generally for a period of time, called a 'promotion'.  This promotion is
broken into 'rounds'.  Each round may have 1..n 'games' in it...and
these games can change between rounds.

So, to find out the total usage of all games for a given promotion,
it's:
  select 
    promotion_id,count(promotion_id) 
  from player_games 
  group by promotion_id;

Okay so far. But perhaps we want to see the count of individual games:
  select 
    promotion_id,game_id,count(game_id) 
  from player_games 
  group by promotion_id,game_id;

Again, good enough as it goes.  Now we get tricky...we want to know the
number of times each game is played on an hourly basis.
  select
    promotion_id, game_id, date_trunc('hour',when_inserted),
    count(date_trunc('hour',when_inserted))
  from player_games 
  group by promotion_id,game_id,date_trunc('hour',when_inserted);

Now this will produce something like this:
   promotion_id | game_id |       date_trunc       | count 
--------------+---------+------------------------+-------
           38 |       0 | 2001-11-28 09:00:00+00 |    84
           38 |       0 | 2001-11-28 12:00:00+00 |   372
           38 |       0 | 2001-11-28 13:00:00+00 |   320

Question 1)
-----------
Is the use of 'date_trunc' 3 times as in the above SQL not horribly
inefficient?  Is there a way to only have to do it once in the query?

Question 2) 
-----------
The REAL problem is in the rows returned not having all the hours
represented.  Note that nobody apparently played game 0 between 10 and
11:59 that day.  when the simple java graph applet I mentioned comes
along, it's not smart enough to fill in the hours that aren't there.

So, we need to provide those blank hours somehow.  I can see several
ways, but was hoping there's a really slick SQL method of dealing with
this.

Method 0 (smarts in wrong place) Change the graphic applet so that it
understand date/time stamps, and fills in blanks somehow.  I don't like
this one, because the applet now has to know about the format of what
time looks like, instead of just copying the rows blindly to the page.

Method 1 (brute force) - The servlet that outputs the page gets the rows
that do exist, figures out which ones don't exist and inserts them in
the output stream with 0's and passes the lot to the applet. I don't
like this one, because the servlet now has to know about the format of
what time looks like, instead of just copying the rows blindly to the
page.

Method 2 (overkill) - Create a table called 'time' with a row for each
hour on the clock.  When creating the rows, do a left join on this table
so that hours that don't have any plays would still be represented. I'm
leery about doing left joins like this, as they are tricky, and can be
killers on the database.

Is there a slick use of 'between', 'interval' or some other SQL-ism (or
PostgreSQL-ism) that will give me in effect a left join on all the hours
from start::timestamp to end::timestamp?

-- 
Ken Corey  CTO  http://www.atomic-interactive.com  07720 440 731


Responses

pgsql-novice by date

Next:From: Rory Campbell-LangeDate: 2002-07-22 15:20:35
Subject: Re: use vi mode in psql
Previous:From: Nigel J. AndrewsDate: 2002-07-22 13:00:47
Subject: Re: URGENT! pg_dump doesn't work!

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group