Re: ORDER BY random() LIMIT 1 slowness

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Gavin M(dot) Roy" <gmr(at)justsportsusa(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ORDER BY random() LIMIT 1 slowness
Date: 2002-12-17 05:44:13
Message-ID: 4144.1040103853@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"Gavin M. Roy" <gmr(at)justsportsusa(dot)com> writes:
> SELECT * FROM poetry ORDER BY random() LIMIT 1;
> [ is slow for 35000 rows ]

Yeah. Basically this query is implemented as
(a) select all 35000 rows of "poetry";
(b) compute a random() value for each row;
(c) sort by the random() values;
(d) take the first row, discard the rest.

The good news: this gives you a pretty-durn-random selection. The bad
news: you didn't really care about choosing a random ordering of the
other 34999 rows, but it computed one anyway.

This problem's been discussed before, but I've not seen any really
ideal answer. SQL is not designed to offer unpredictable results ;-)

If you have a reasonably-uniformly-distributed index key on the rows,
you can try something like

select * from poetry where key > (scale * random() + min)
order by key limit 1;

(where scale and min are chosen to make the result cover the range of
index keys). Given an index on "key" this should pick a result quickly
via an indexscan+limit plan. However, if the distribution of keys isn't
real uniform then you won't get real random results --- keys just above
larger gaps will be selected with unfairly high probability. You also
have to know the min and max keys accurately.

I recall some folks have suggested generating an actual random column
(ie, something initialized with "default random()" and suitably indexed)
but this seems like a lot of overhead ... you'd have to be mighty
concerned with the exactness of your random selection to put up with
that, I think.

There are some other suggestions in the archives, IIRC.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Martijn van Oosterhout 2002-12-17 05:57:44 Re: How to access deleted but not vacuum'ed tuples?
Previous Message Lamar Owen 2002-12-17 05:33:20 Re: Upgrading 6.5.1