Re: ORDER BY random() LIMIT 1 slowness

From: Jean-Luc Lachance <jllachan(at)nsd(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gavin M(dot) Roy" <gmr(at)justsportsusa(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: ORDER BY random() LIMIT 1 slowness
Date: 2002-12-17 16:04:37
Message-ID: 3DFF4B15.C8B3A8B9@nsd.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Gavin,

Assuming that you have a serial column rand on poetry and you did not
delete any row,
here is my suggestion:

CREATE TABLE poetry ( rand SERIAL, ... );

SELECT * FROM poetry WHERE rand = (
SELECT int8( curval( 'poetry_rand_seq') * random()));

JLL

Tom Lane wrote:
>
> "Gavin M. Roy" <gmr(at)justsportsusa(dot)com> writes:
> > SELECT * FROM poetry ORDER BY random() LIMIT 1;
> > [ is slow for 35000 rows ]
>
> Yeah. Basically this query is implemented as
> (a) select all 35000 rows of "poetry";
> (b) compute a random() value for each row;
> (c) sort by the random() values;
> (d) take the first row, discard the rest.
>
> The good news: this gives you a pretty-durn-random selection. The bad
> news: you didn't really care about choosing a random ordering of the
> other 34999 rows, but it computed one anyway.
>
> This problem's been discussed before, but I've not seen any really
> ideal answer. SQL is not designed to offer unpredictable results ;-)
>
> If you have a reasonably-uniformly-distributed index key on the rows,
> you can try something like
>
> select * from poetry where key > (scale * random() + min)
> order by key limit 1;
>
> (where scale and min are chosen to make the result cover the range of
> index keys). Given an index on "key" this should pick a result quickly
> via an indexscan+limit plan. However, if the distribution of keys isn't
> real uniform then you won't get real random results --- keys just above
> larger gaps will be selected with unfairly high probability. You also
> have to know the min and max keys accurately.
>
> I recall some folks have suggested generating an actual random column
> (ie, something initialized with "default random()" and suitably indexed)
> but this seems like a lot of overhead ... you'd have to be mighty
> concerned with the exactness of your random selection to put up with
> that, I think.
>
> There are some other suggestions in the archives, IIRC.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Neil Conway 2002-12-17 16:31:52 Re: 7.3 Prepared statements
Previous Message Shridhar Daithankar 2002-12-17 16:03:21 Re: Server testing.