Skip site navigation (1) Skip section navigation (2)

Re: Selecting random rows efficiently

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: rj(at)last(dot)fm
Cc: PgSQL Performance ML <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Selecting random rows efficiently
Date: 2003-08-30 14:47:42
Message-ID: 23400.1062254862@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-performance
Richard Jones <rj(at)last(dot)fm> writes:
>>> i have a table of around 3 million rows from which i regularly (twice a
>>> second at the moment) need to select a random row from

> i was hoping there was some trickery with sequences that would allow me to 
> easily pick a random valid sequence number..?

There is no magic bullet here, but if you expect that requirement to
persist then it is worth your trouble to expend effort on a real
solution.  A real solution in my mind would look like

1. Add a column "random_id float8 default random()".  The idea here
   is that you assign a random ID to each row as it is created.

2. Add an index on the above column.

3. Your query now looks like

	SELECT * FROM table WHERE random_id >= random()
	ORDER BY random_id LIMIT 1;

This gives you a plan on the order of

 Limit  (cost=0.00..0.17 rows=1 width=8)
   ->  Index Scan using fooi on foo  (cost=0.00..57.00 rows=334 width=8)
         Filter: (random_id >= random())

which is fast and gives a genuinely random result row.  At least up
until you have enough rows that there start being duplicate random_ids,
which AFAIK would be 2 billion rows with a decent random()
implementation.  If you're concerned about that, you could periodically
re-randomize with
	UPDATE table SET random_id = random();
so that any rows that were "hidden" because they had a duplicate
random_id have another shot at being choosable.  But with only a few mil
rows I don't think you need to worry.

			regards, tom lane

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2003-08-30 15:02:01
Subject: Re: How to force Nested Loop plan?
Previous:From: Tom LaneDate: 2003-08-30 14:35:52
Subject: Re: bad estimates

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2003-08-30 15:02:50
Subject: Re: Linux2.6 overcommit behaviour
Previous:From: Richard JonesDate: 2003-08-30 14:25:51
Subject: Re: Selecting random rows efficiently

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group