Re: Obtaining random rows from a result set

From: Kaloyan Iliev <kaloyan(at)digsys(dot)bg>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Obtaining random rows from a result set
Date: 2007-08-31 13:34:48
Message-ID: 46D818F8.9080006@digsys.bg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi,
Why not generate a random number in your application and then:

SELECT *
FROM table_x
WHERE condition = true
OFFSET generated_random_number
LIMIT xx

Kaloyan Iliev

Alban Hertroys wrote:

>Hello,
>
>I've recently been busy improving a query that yields a fixed number of
>random records matching certain conditions. I have tried all the usual
>approaches, and although they do work, they're all limited in some way
>and don't translate really well to what you "want". They're kludges, IMHO.
>
>The methods I've tried are explained quite well on
>http://people.planetpostgresql.org/greg/index.php?/archives/40-Getting-random-rows-from-a-database-table.html
>
>All these methods involve calculating a random number for every record
>in the result set at some point in time, which is really not what I'm
>trying to model. I think the database should provide some means to get
>those records, so...
>
>Dear Santa,
>
>I'd like my database to have functionality analogue to how LIMIT works,
>but for other - non-sequential - algorithms.
>
>I was thinking along the lines of:
>
> SELECT *
> FROM table
> WHERE condition = true
> RANDOM 5;
>
>Which would (up to) return 5 random rows from the result set, just as
>LIMIT 5 returns (up to) the first 5 records in the result set.
>
>
>Or maybe even with a custom function, so that you could get non-linear
>distributions:
>
> SELECT *
> FROM table
> WHERE condition = true
> LIMIT 5 USING my_func();
>
> Where my_func() could be a user definable function accepting a number
>that should be (an estimate of?) the number of results being returned so
>that it can provide pointers to which rows in the resultset will be
>returned from the query.
>
>Examples:
>* random(maxrows) would return random rows from the resultset.
>* median() would return the rows in the middle of the result set (this
>would require ordering to be meaningful).
>
>What do people think, is this feasable? Desirable? Necessary?
>
>If I'd have time I'd volunteer for at least looking into this, but I'm
>working on three projects simultaneously already. Alas...
>
>Regards,
>Alban Hertroys.
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Phoenix Kiula 2007-08-31 13:35:09 Re: URGENT: Whole DB down ("no space left on device")
Previous Message Josh Tolley 2007-08-31 13:33:36 Re: Obtaining random rows from a result set