Re: count(*) and bad design was: Experiences with extensibility

From: Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl>
To: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: count(*) and bad design was: Experiences with extensibility
Date: 2008-01-17 11:08:54
Message-ID: 90E693F2-125B-4554-A919-36C839F29F6C@solfertje.student.utwente.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Jan 15, 2008, at 3:03 PM, Ivan Sergio Borgonovo wrote:

> On Tue, 15 Jan 2008 14:43:35 +0100
> Alban Hertroys <dalroi(at)solfertje(dot)student(dot)utwente(dot)nl> wrote:
>
>> You need to scroll to the last row to find the size of the result
>> set, but after that it's pretty easy to return random rows by
>> scrolling to them (and marking them 'read' in some way to prevent
>> accidentally returning the same row again).
>
> Could you post a snippet of code or something giving a more detailed
> idea of it?
>
> BTW since cursors support offset if you're not interested if the
> order of the retrieved rows is random too you don't even have to
> remember which one you read I think.

I posted it on this list a while ago when I came up with this
solution. I had some trouble finding my old post in the pgsql-general
archives though - I could find the thread, just not my final posting,
and searching didn't even turn up the thread.

I did find it here: http://www.mail-archive.com/pgsql-
general(at)postgresql(dot)org/msg103670.html
The thread contains several other approaches to the problem, it
really depends on your problem domain which one fits your bill.

I think the function in my original posting could do with clearer
comments though, so here's the function again:

/*
* Return $limit random rows from the result set of SQL query $query
*/
function randomSet(
$query, // The query to execute
$limit // The (max) number of random rows required
) {
// SQL to declare the cursor
query("DECLARE _cur SCROLL CURSOR WITHOUT HOLD FOR $query");

/* Get the range for random(1, n)
*
* Determined by scrolling the cursor to the last row.
* Equivalent to select count(*), but without a separate query.
*/
query("MOVE FORWARD ALL IN _cur");
$count = pg_affected_rows();

$uniques = array(); // A list of used cursor offsets
$resultSet = array();

// Fetch random rows until we have enough or there are no more
while ($limit > 0 && count($uniques) < $count) {
// Determine random scroll offset
$idx = random(1, $count);

// Skip records with an index we already used
if (in_array($idx, $uniques))
continue;

//Fetch the random row
$record = query("FETCH ABSOLUTE $idx FROM _cur");

// Add the row offset to the list of used offsets
$uniques[] = $idx;

$resultSet[] = $record;
$limit--;
}

// query
query("CLOSE _cur");

return $resultSet;
}

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,478f32e59497683469944!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sebastián Baioni 2008-01-17 11:51:02 Can't make backup (again)
Previous Message Reg Me Please 2008-01-17 10:48:24 Accessing composite type columns from C