Re: How pull

From: Filip Rembiałkowski <plk(dot)zuber(at)gmail(dot)com>
To: "Matt Magoffin" <postgresql(dot)org(at)msqr(dot)us>
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: How pull
Date: 2007-09-24 01:15:47
Message-ID: 92869e660709231815pe28db6cp6d1f132f16f2c181@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On 23/09/2007, Matt Magoffin <postgresql(dot)org(at)msqr(dot)us> wrote:
> Hello,
>
> My SQL skills are limited and I'm struggling with a query where I want to
> return a single item of an aggregate join. The query looks like this:
>
> select
> (case
> when agg.avg_rating is null then 0.0
> when agg.avg_rating < 0.75 then 0.5
> when agg.avg_rating < 1.25 then 1.0
> when agg.avg_rating < 1.75 then 1.5
> when agg.avg_rating < 2.25 then 2.0
> when agg.avg_rating < 2.75 then 2.5
> when agg.avg_rating < 3.25 then 3.0
> when agg.avg_rating < 3.75 then 3.5
> when agg.avg_rating < 4.25 then 4.0
> when agg.avg_rating < 4.75 then 4.5
> else 5.0
> end) as avg_rating,
> count(item.itemid) as item_count
> from media_item item
> inner join (
> select rating.mediaitem_userrating_hjid as ritemid,
> avg(rating.rating) as avg_rating
> from media_item_rating rating, media_item item
> where rating.mediaitem_userrating_hjid = item.itemid
> group by rating.mediaitem_userrating_hjid
> ) as agg
> on item.itemid = agg.ritemid
> group by avg_rating
> order by avg_rating desc
>
> and a sample of results is this:
>
> avg_rating | item_count
> ------------+------------
> 5.0 | 21
> 4.0 | 33
> 3.0 | 13
> 2.0 | 4
> 1.0 | 1
>
> What I want as well is the ID of the item (and possibly it's avg_rating
> value) from the "agg" join with the highest avg_rating for each output
> row... something like this
>
> avg_rating | item_count | item_id | item_rating
> ------------+-----------------------------------
> 5.0 | 21 | 109890 | 4.9
> 4.0 | 33 | 89201 | 4.1
> 3.0 | 13 | 119029 | 2.8
> 2.0 | 4 | 182999 | 2.2
> 1.0 | 1 | 1929 | 1.0
>
> So the intention in this example is that item #109890 has an average
> rating of 4.9 and that is the highest rating within the > 4.75 rating
> group.
>
> If anyone had any tips I'd greatly appreciate it.
>

create ranking function to make queries look simpler:

create or replace function ranking_group(numeric) returns numeric as
$$ select case
when $1 < 0.3456 then 'quite small'
...
end $$ language sql immutable;
(I'd make it STRICT, but you allow null rankings)

1st way: DISTINCT ON + subquery

select *, (select count(*) from rating where rating_group(rating) =
subq.rating_group ) as rating_group_size
from (
select distinct on (rating_group)
rating_group(r.rating),
r.item_id as best_rated_item_id,
r.rating as best_rating
from rating r
order by rating_group desc, r.rating desc
) subq;

2nd way (faster - actually 2 x faster)
using FIRST aggregate to calculate all in one pass

create function first(numeric,numeric) returns numeric as 'select $1'
language sql immutable strict;
create function first(integer,integer) returns integer as 'select $1'
language sql immutable strict;
create aggregate first( integer ) ( SFUNC = first, STYPE = integer );
create aggregate first ( numeric ) ( SFUNC = first, STYPE = numeric );

select
rating_group(rating),
count(*) as num_ratings,
first(item_id) as best_rated_item_id,
first(rating) as best_rating
from ( select * from rating order by rating desc ) ordered_ratings
group by rating_group
order by rating_group desc;

note: if you can, get rid of null ratings. what are they supposed to
mean? they make things a bit more complicated.

--
Filip Rembiałkowski

In response to

  • How pull at 2007-09-23 22:26:36 from Matt Magoffin

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Erik Jones 2007-09-24 04:56:24 Re: Many databases
Previous Message Matt Magoffin 2007-09-23 22:26:36 How pull