Re: aggregation question

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: pgsql-sql(at)postgresql(dot)org
Subject: Re: aggregation question
Date: 2010-12-01 09:06:39
Message-ID: AANLkTik3-kgfGdQx6iKTDW7te5fEFeVZtONQZLqJM-Ut@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Tue, Nov 30, 2010 at 7:47 PM, Samuel Gendler
<sgendler(at)ideasculptor(dot)com>wrote:

> I have a fact table with a number of foreign keys to dimension data and
> some measure columns. Let's assume I've got dimension columns a,b,c,d,e,
> and f and measure columns x, and y.
>
> I need to be able to find the value of f, x, and y for the row with
> min(x/y) when e in (1,2) and max(x/y) when e not in (1,2) with grouping by
> a,b,c, and d.
>
> A query which would identify the row in question would look like this:
>
> select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by
> a,b,c,d
>
> But I don't know how to go about also extracting the value of x, y, and f
> for the row which is matched - I suppose I could join to a subquery, but
> that strikes me as probably not being the most efficient way to accomplish
> it.
>
> Actually, it occurs to me that I've potentially got more than one row for
> any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f
>
> Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as
> well as a,b,c,d,e) for the f with min(sum(x)/sum(y))
>
>
I solved this, but I'm not sure if my solution is ideal

-- select distinct causes multiple rows that tie for rank = 1

-- to condense down to a single row. Outer query also strips

-- any rows from windows with less than 3 entries.

select distinct count, total_ms, time, fk_e,
fk_a, fk_b, fk_c
from (
-- grab count and total_ms for each fk_g, grouped by
-- a,b,c, time, and e. Use rank() window function
-- to rank each row in the window, ordered by total_ms/count
-- and use count(*) as window function to get total rows in
-- the window.
select sum(f.x) as count, sum(f.y) as total_ms,
f.time as time,
f.fk_a, f.fk_b, f.fk_c,
f.fk_e, f.fk_f,
rank() over (w_agg order by sum(f.x)/sum(f.y)) as rank,
count(*) over (w_agg) as rows
from fact_table f
group by 3,4,5,6,7,8
window w_agg as (partition by time,
fk_a, fk_b, fk_c,
fk_e)
) q
where q.rank = 1 and q.rows >= 3

I will need to run this twice - once with rank() ordered descending and the
inner query actually limited by fk_e in (4,14) and once with rank() ordered
ascending for fk_e not in (4,14). It is possible I can put a case
statement in the inner select in order to handle that. I haven't tried it
yet.

Is there a better solution than the outer 'select distinct' in order to only
see rows that have rank = 1 and to never have duplicate rows?

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Lee Hachadoorian 2010-12-01 09:13:52 Re: OT - load a shp file
Previous Message John Fabiani 2010-12-01 06:06:58 OT - load a shp file