Ranking values within a query (pseudo-ROWNUM)

From: Jeff Boes <jboes(at)nexcerpt(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Ranking values within a query (pseudo-ROWNUM)
Date: 2004-06-24 18:21:42
Message-ID: 40DB1BB6.4090302@nexcerpt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I hope this helps someone else ... I had struggled some time ago with
attempts to get a rank of values query to work, but then I gave up and
set it aside. I had another reason to attack it, and in between then and
now I learned how to return "setof" values from a function, as well as
how to construct "dynamic" queries inside a function.

Returning the top 10 values from a query is no big deal:

create table my_table (field1 integer, field2 integer, field3 text);
...
select * from my_table order by field1 limit 10;

If you want the top value for each value of field1, a DISTINCT works:

select distinct on (field1) * from my_table
order by field1, field2 desc;

(i.e., the row with the biggest field2 value for each set of rows with
the same field1 value).

However, if you want the top *N* values, it gets a lot trickier.
PostgreSQL lacks the concept of ROWNUM, which would let you filter in
just those rows in the top 5, for example.

Here's what I came up with; I've not applied this to more than a couple
of test cases, so there may very well be flaws in this approach.

create table rank_of_values(rank_of integer, the_value integer);

(The table is necessary so that our functions can return a rowtype.
Someday PostgreSQL may have a way to add a rowtype to the database
without a corresponding table.)

Now there are two functions, with the same name (and Pg isn't confused,
because they have different numbers of arguments).

The first function is a straightforward ranking of values. Given a
column name, and a "FROM" clause, it returns a set of rows with the
column value and rank (tied values do NOT have the same rank).

CREATE FUNCTION fn_rank_values(TEXT,TEXT)
RETURNS setof rank_of_values AS '
DECLARE
t RECORD;
r rank_of_values%ROWTYPE;
curs REFCURSOR;
col ALIAS FOR $1;
stmt ALIAS FOR $2;
rank INTEGER;
BEGIN
OPEN curs FOR EXECUTE 'SELECT "' || col || '" AS
"the_value" ' || stmt;
rank := 1;
LOOP
FETCH curs INTO t;
EXIT WHEN NOT FOUND;
r.rank_of = rank;
r.the_value = t.the_value;
RETURN next r;
rank := 1 + rank;
END LOOP;
CLOSE curs;
RETURN;
END;
' LANGUAGE 'plpgsql' STABLE;

The second function is the "rank-within-group" facility. Given a
value-column, a group-column, and a "FROM" clause, it returns the
original value, and its rank within the group formed by identical values
of the group-column.

CREATE FUNCTION fn_rank_values(TEXT,TEXT,TEXT)
RETURNS setof rank_of_values AS '
DECLARE
t RECORD;
r rank_of_values%ROWTYPE;
curs REFCURSOR;
col ALIAS FOR $1;
grp ALIAS FOR $2;
clause ALIAS FOR $3;
rank INTEGER;
curr_grp INTEGER;
stmt TEXT;
BEGIN
stmt := 'SELECT "' || col || '" AS "the_value", "' || grp
|| '" AS "the_group" ' || clause;
OPEN curs FOR EXECUTE stmt;
rank := 1;
LOOP
FETCH curs INTO t;
EXIT WHEN NOT FOUND;
IF curr_grp IS NULL
THEN
curr_grp = t.the_group;
ELSIF curr_grp != t.the_group
THEN
curr_grp = t.the_group;
rank = 1;
END IF;
r.rank_of = rank;
r.the_value = t.the_value;
RETURN next r;
rank := 1 + rank;
END LOOP;
CLOSE curs;
RETURN;
END;
' LANGUAGE 'plpgsql' STABLE;

Now you can do things like this:

select rank_of, the_value from fn_rank_values('field1','from mytable
order by field1') order by rank_of;

which will give you the rank number and original value of each row.

The second function is more interesting, because that's where you get
the ability to do "top N by X" kinds of queries. For instance,

select field1, field2, field3 from my_table
join fn_rank_values('field1','field2','from my_table order by field1,
field2') on (field1=the_value)
where rank_of <= 5 order by field2, rank_of;

--
Jeff Boes vox 269.226.9550 ext 24
Database Engineer fax 269.349.9076
Nexcerpt, Inc. http://www.nexcerpt.com
...Nexcerpt... Extend your Expertise

Browse pgsql-general by date

  From Date Subject
Next Message Jeffrey Melloy 2004-06-24 18:23:14 Renaming a schema
Previous Message Joshua D. Drake 2004-06-24 17:50:09 Re: Possible To Modify Part of String in Certain Rows?