getting the ranks out of items with SHARED

From: Janning Vygen <vygen(at)gmx(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: getting the ranks out of items with SHARED
Date: 2005-07-12 08:48:51
Message-ID: 200507121048.52113.vygen@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,

in postgresql you have several possibilites to get the rank of items. A thread
earlier this year shows correlated subqueries (not very performant) and other
tricks and techniques to solve the ranking problem:

http://archives.postgresql.org/pgsql-general/2005-05/msg00157.php

The possibility to use a SHARED variable in plperl can be another nice way to
get a rank of items. ( good example showing SHARED in use is at
http://www.varlena.com/varlena/GeneralBits/114.php)

So i wrote two functions for my purpose:

CREATE OR REPLACE function ranking(int4) RETURNS int4 LANGUAGE 'plperl' AS $$
my %this;

$this{'punkte'} = shift;
$this{'ranking'} = $_SHARED{'prev'}{'ranking'}; # defaults to 0
$this{'count'} = $_SHARED{'prev'}{'count'} + 1; # defaults to 1

$this{'ranking'} = $this{'count'} unless
$this{'punkte'} == $_SHARED{'prev'}{'punkte'};

$_SHARED{'prev'} = \%this;
return $this{'ranking'};
$$;

CREATE OR REPLACE FUNCTION reset_ranking() RETURNS void LANGUAGE 'plperl' AS
$$
$_SHARED{'prev'} = undef;
$$;

Nice Thing: the function drops rankings which other ranking solutions in the
given thread can't. Like this:

rank | points
-------------
1 | 10
2 | 9
2 | 9
4 | 8
5 | 7

It drops rank 3 because we have to entries for second rank.

It would be even nice if you can write a genric ranking() function which takes
anyarray as an argument, but as far as i know you can't pass an "anyarray" to
a plperl function, right?

Now i can do the following in plpsql Function which updates a caching table
for me and it works fine:

PERFORM reset_ranking();

CREATE TEMP TABLE ranking AS
SELECT
*,
ranking(r1.gc_gesamtpunkte) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
;

EXECUTE '
UPDATE temp_gc
SET gc_rank = ranking.rank
FROM ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';

DROP TABLE ranking;

Problems arrise when you try to do the select and update step together without
any temporary table in between:

PERFORM reset_ranking();

UPDATE temp_gc SET gc_rank = ranking.rank
FROM (
SELECT
*,
ranking(r1.gc_gesamtpunkte) AS rank
FROM (
SELECT
mg_name,
gc_gesamtpunkte
FROM temp_gc
ORDER BY gc_gesamtpunkte DESC, mg_name ASC
) AS r1
) AS ranking
WHERE temp_gc.mg_name = ranking.mg_name;
';

I have a guess, what happens here: The order of the subselect statement is
dropped by the optimizer because the optimizer doesn't see the "side-effect"
of the ranking function. that's ok because using such functions isn't SQLish,
i guess.

Is there a way to FORCE the optimizer to keep things orders like the sql
statement author wanted it?

kind regards,
janning

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bjørn T Johansen 2005-07-12 10:22:33 Windows version of PostgreSQL 8?
Previous Message Marco Gaiarin 2005-07-12 08:19:37 7.2 -> 7.4: horrible performance hit!