WINDOW functions - proposed addition weight (dp) for percent_rank, cume_dist

From: Michael van der Kolff <mvanderkolff(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: WINDOW functions - proposed addition weight (dp) for percent_rank, cume_dist
Date: 2010-02-08 21:23:49
Message-ID: 8233f01e1002081323x36c28c14y7bd7247c5d5187ff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dear all,

I have a bunch of CDRs which I turned into a running list showing how
many were in progress at any one time:

WITH
parameters as (
SELECT
(timestamp with time zone '2010-01-19 00:00:01+11') as beginning,
(timestamp with time zone '2010-01-19 00:00:01+11') + (interval
'24 hours' * 17) as ending
),
call_times as (
SELECT
greatest(calldate,beginning) as callstart,
least(calldate + duration * (interval '1 second'),p.ending) as callend
FROM
cdr, parameters p
where ((calldate BETWEEN beginning AND ending) or ((calldate +
duration * (interval '1 second')) BETWEEN beginning AND ending))
/* AND dcontext <> 'internalNumbers' */ --uncomment for outgoing
call statistics
),
call_starts as (
SELECT
callstart as eventTime,
1::int8 as numInProgress
FROM
call_times
),
call_ends as (
SELECT
callend as eventTime,
-1::int8 as numInProgress
FROM
call_times
),
call_times_numInProgress as (
select period(eventTime, lead(eventTime,1,p.ending) over
byEventTime) as timeSlice, sum(numInProgress) over byEventTime as
callsInProgress
FROM
(select eventTime, numInProgress from call_starts union all select
eventTime, numInProgress from call_ends) events, parameters p
WINDOW byEventTime as (order by eventTime asc)
)
SELECT *, percent_rank() OVER (ORDER BY callsInProgress ASC) AS
percentile FROM call_times_numInProgress ORDER BY callsInProgress DESC

using the asterisk 'standard' definition for CDRs with one extra field
asking whether it's been copied to an accounting program (should do
something more sensible, but it's using firebird for the moment...):
CREATE TABLE cdr
(
acctid bigserial NOT NULL,
calldate timestamp with time zone NOT NULL DEFAULT now(),
clid character varying(45) NOT NULL DEFAULT ''::character varying,
src character varying(45) NOT NULL DEFAULT ''::character varying,
dst character varying(45) NOT NULL DEFAULT ''::character varying,
dcontext character varying(45) NOT NULL DEFAULT ''::character varying,
channel character varying(45) NOT NULL DEFAULT ''::character varying,
dstchannel character varying(45) NOT NULL DEFAULT ''::character varying,
lastapp character varying(45) NOT NULL DEFAULT ''::character varying,
lastdata character varying(45) NOT NULL DEFAULT ''::character varying,
duration integer NOT NULL DEFAULT 0,
billsec integer NOT NULL DEFAULT 0,
disposition character varying(45) NOT NULL DEFAULT ''::character varying,
amaflags integer NOT NULL DEFAULT 0,
accountcode character varying(45) NOT NULL DEFAULT ''::character varying,
uniqueid character varying(45) NOT NULL DEFAULT ''::character varying,
"copiedToSPInfo" boolean DEFAULT false,
CONSTRAINT cdr_pkey PRIMARY KEY (acctid)
)
WITH (
OIDS=FALSE
);

But percent_rank here simply shows the direct percentile. What I
would like is to say percent_rank(length(timeSlice)), which would give
the result of the cumulative sum of the length of each period divided
by the sum of the entire period

So what I propose is a minor extension: Add a weight parameter to
percent_rank/cume_dist/others, which defaults to 1. Current behaviour
should stay the same, AFAICT...

Cheers,

Michael

Browse pgsql-general by date

  From Date Subject
Next Message BlackMage 2010-02-08 22:35:06 Help Join Tables
Previous Message Keaton Adams 2010-02-08 21:10:13 Cache lookup failed for relation message in PG 8.3.7