Re: Selecting random rows efficiently

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: rj(at)last(dot)fm
Cc: PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Selecting random rows efficiently
Date: 2003-08-30 15:14:04
Message-ID: 25757.1062256444@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I said:
> 3. Your query now looks like
> SELECT * FROM table WHERE random_id >= random()
> ORDER BY random_id LIMIT 1;

Correction: the above won't give quite the right query because random()
is marked as a volatile function. You can hide the random() call inside
a user-defined function that you (misleadingly) mark stable, or you can
just stick it into a sub-select:

regression=# explain select * from foo WHERE random_id >= (select random())
regression-# ORDER BY random_id LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------
Limit (cost=0.01..0.15 rows=1 width=8)
InitPlan
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Index Scan using fooi on foo (cost=0.00..45.50 rows=334 width=8)
Index Cond: (random_id >= $0)
(5 rows)

This technique is probably safer against future planner changes,
however:

regression=# create function oneshot_random() returns float8 as
regression-# 'select random()' language sql stable;
CREATE FUNCTION
regression=# explain select * from foo WHERE random_id >= oneshot_random()
regression-# ORDER BY random_id LIMIT 1;
QUERY PLAN
-------------------------------------------------------------------------
Limit (cost=0.00..0.14 rows=1 width=8)
-> Index Scan using fooi on foo (cost=0.00..46.33 rows=334 width=8)
Index Cond: (random_id >= oneshot_random())
(3 rows)

The point here is that an indexscan boundary condition has to use stable
or immutable functions. By marking oneshot_random() stable, you
essentially say that it's okay to evaluate it only once per query,
rather than once at each row.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2003-08-30 15:16:02 Re: Linux2.6 overcommit behaviour
Previous Message Bruce Momjian 2003-08-30 15:11:11 Re: Index creation takes for ever

Browse pgsql-performance by date

  From Date Subject
Next Message Matt Clark 2003-08-30 15:36:20 Re: Hardware recommendations to scale to silly load
Previous Message Tom Lane 2003-08-30 15:02:01 Re: How to force Nested Loop plan?