From: | Samuel Gendler <sgendler(at)ideasculptor(dot)com> |
---|---|
To: | pgsql-sql(at)postgresql(dot)org |
Subject: | Re: aggregation question |
Date: | 2010-12-01 09:06:39 |
Message-ID: | AANLkTik3-kgfGdQx6iKTDW7te5fEFeVZtONQZLqJM-Ut@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
On Tue, Nov 30, 2010 at 7:47 PM, Samuel Gendler
<sgendler(at)ideasculptor(dot)com>wrote:
> I have a fact table with a number of foreign keys to dimension data and
> some measure columns. Let's assume I've got dimension columns a,b,c,d,e,
> and f and measure columns x, and y.
>
> I need to be able to find the value of f, x, and y for the row with
> min(x/y) when e in (1,2) and max(x/y) when e not in (1,2) with grouping by
> a,b,c, and d.
>
> A query which would identify the row in question would look like this:
>
> select min(x/y), a, b, c, d, e from my_table group by a,b,c,d,e order by
> a,b,c,d
>
> But I don't know how to go about also extracting the value of x, y, and f
> for the row which is matched - I suppose I could join to a subquery, but
> that strikes me as probably not being the most efficient way to accomplish
> it.
>
> Actually, it occurs to me that I've potentially got more than one row for
> any given a,b,c,d,e,f tuple, but I want a min for all rows aggregated by f
>
> Basically, for each (a,b,c,d,e) I want to return sum(x), sum(y), and f (as
> well as a,b,c,d,e) for the f with min(sum(x)/sum(y))
>
>
I solved this, but I'm not sure if my solution is ideal
-- select distinct causes multiple rows that tie for rank = 1
-- to condense down to a single row. Outer query also strips
-- any rows from windows with less than 3 entries.
select distinct count, total_ms, time, fk_e,
fk_a, fk_b, fk_c
from (
-- grab count and total_ms for each fk_g, grouped by
-- a,b,c, time, and e. Use rank() window function
-- to rank each row in the window, ordered by total_ms/count
-- and use count(*) as window function to get total rows in
-- the window.
select sum(f.x) as count, sum(f.y) as total_ms,
f.time as time,
f.fk_a, f.fk_b, f.fk_c,
f.fk_e, f.fk_f,
rank() over (w_agg order by sum(f.x)/sum(f.y)) as rank,
count(*) over (w_agg) as rows
from fact_table f
group by 3,4,5,6,7,8
window w_agg as (partition by time,
fk_a, fk_b, fk_c,
fk_e)
) q
where q.rank = 1 and q.rows >= 3
I will need to run this twice - once with rank() ordered descending and the
inner query actually limited by fk_e in (4,14) and once with rank() ordered
ascending for fk_e not in (4,14). It is possible I can put a case
statement in the inner select in order to handle that. I haven't tried it
yet.
Is there a better solution than the outer 'select distinct' in order to only
see rows that have rank = 1 and to never have duplicate rows?
From | Date | Subject | |
---|---|---|---|
Next Message | Lee Hachadoorian | 2010-12-01 09:13:52 | Re: OT - load a shp file |
Previous Message | John Fabiani | 2010-12-01 06:06:58 | OT - load a shp file |