Re: Obtaining random rows from a result set

From: Erik Jones <erik(at)myemma(dot)com>
To: kaloyan(at)digsys(dot)bg
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Obtaining random rows from a result set
Date: 2007-08-31 16:05:45
Message-ID: 1F87CA98-2DFD-4551-812C-1B8F55506D09@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Aug 31, 2007, at 8:34 AM, Kaloyan Iliev wrote:

> Alban Hertroys wrote:
>
>> Hello,
>>
>> I've recently been busy improving a query that yields a fixed
>> number of
>> random records matching certain conditions. I have tried all the
>> usual
>> approaches, and although they do work, they're all limited in some
>> way
>> and don't translate really well to what you "want". They're
>> kludges, IMHO.
>>
>> The methods I've tried are explained quite well on
>> http://people.planetpostgresql.org/greg/index.php?/archives/40-
>> Getting-random-rows-from-a-database-table.html
>>
>> All these methods involve calculating a random number for every
>> record
>> in the result set at some point in time, which is really not what I'm
>> trying to model. I think the database should provide some means to
>> get
>> those records, so...
>>
>> Dear Santa,
>>
>> I'd like my database to have functionality analogue to how LIMIT
>> works,
>> but for other - non-sequential - algorithms.
>>
>> I was thinking along the lines of:
>>
>> SELECT *
>> FROM table
>> WHERE condition = true
>> RANDOM 5;
>>
>> Which would (up to) return 5 random rows from the result set, just as
>> LIMIT 5 returns (up to) the first 5 records in the result set.
>>
>>
>> Or maybe even with a custom function, so that you could get non-
>> linear
>> distributions:
>>
>> SELECT *
>> FROM table
>> WHERE condition = true
>> LIMIT 5 USING my_func();
>>
>> Where my_func() could be a user definable function accepting a
>> number
>> that should be (an estimate of?) the number of results being
>> returned so
>> that it can provide pointers to which rows in the resultset will be
>> returned from the query.
>>
>> Examples:
>> * random(maxrows) would return random rows from the resultset.
>> * median() would return the rows in the middle of the result set
>> (this
>> would require ordering to be meaningful).
>>
>> What do people think, is this feasable? Desirable? Necessary?
>>
>> If I'd have time I'd volunteer for at least looking into this, but
>> I'm
>> working on three projects simultaneously already. Alas...
>>
>> Regards,
>> Alban Hertroys.
>>
>>
> Hi,
> Why not generate a random number in your application and then:
>
> SELECT *
> FROM table_x
> WHERE condition = true
> OFFSET generated_random_number
> LIMIT xx
>
> Kaloyan Iliev
>

That won't work without some kind of a priori knowledge of how many
rows the query would return without the offset and limit.

Erik Jones

Software Developer | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Marco Bizzarri 2007-08-31 16:54:32 computing and updating the size of a table with large objects
Previous Message Erik Jones 2007-08-31 15:53:03 Re: URGENT: Whole DB down ("no space left on device")