Re: Obtaining random rows from a result set

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Alban Hertroys <alban(at)magproductions(dot)nl>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Obtaining random rows from a result set
Date: 2007-09-04 15:41:51
Message-ID: 46DD7CBF.7040609@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

To follow up on my own post, I came up with a workable solution based on
scrolling cursors. The SP approach didn't work out for me, I didn't
manage to declare a cursor in PL/pgSQL that could be positioned
absolutely (maybe that's due to us still using PG 8.1.something?).

A solution to that would be appreciated.

Anyway, I solved the problem in our application (PHP). I even got a
workable solution to prevent returning the same record more than once.
Here goes:

function randomSet($query, $limit, $uniqueColumn) {

// queries; depends on your DB connector
DECLARE _cur SCROLL CURSOR WITHOUT HOLD FOR $query;
MOVE FORWARD ALL IN _cur;

//GET DIAGNOSTICS _count := ROW_COUNT;
$count = pg_affected_rows();

$uniques = array();
$resultSet = array();
while ($limit > 0 && count($uniques) < $count) {
$idx = random(1, $count);

//query
$record = FETCH ABSOLUTE $idx FROM _cur;

// Skip records with a column value we want to be unique
if (in_array($record[$uniqueColumn], $uniques)
continue;

$uniques[] = $record[$uniqueColumn];
$resultSet[] = $record;
$limit--;
}

// query
CLOSE _cur;

return $resultSet;
}

I hope this is useful to anyone. It worked for us; it is definitely
faster than order by random(), and more random than precalculated column
values. Plus it translates directly to what we are requesting :)

Alban Hertroys wrote:
> I thought of another solution (with only a few calculations of random())
> that can be deployed in existing versions of PG, using a set-returning
> function with a scrolling cursor that accepts the query string as input
> like this (in pseudoish-code):
>
> ----
> create function random(text _query, integer _limit)
> returns set
> volatile
> as $$
> DECLARE
> _cur cursor;
> _cnt bigint;
> _idx integer;
> _rowpos bigint;
>
> _rec record;
> BEGIN
> open _cur for execute query;
> fetch forward all into _rec;
> -- select total nr of records into _cnt
>
> for _idx in 1.._limit loop
> _rowpos := random() * _cnt;
>
> fetch absolute _rowpos into _rec;
> return next _rec;
> end loop;
>
> return;
> END;
> $$
> language 'plpgsql';
> ----

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alban Hertroys 2007-09-04 15:44:52 Re: Obtaining random rows from a result set
Previous Message Nik 2007-09-04 15:38:59 Partition Reindexing