Re: Selecting K random rows - efficiently!

From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: cluster <skrald(at)amossen(dot)dk>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Selecting K random rows - efficiently!
Date: 2007-10-25 03:01:08
Message-ID: dcc563d10710242001o1f008a06u8ae684a67691e68b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Here's how I would do it. This assumes a static table that doesn't
change a lot.

1: find the row count n of the table.
2: randomly assign 1 through n to each row randomly. How to do this
is a whole not post.
3: create a sequence. If you always need 10 or 100 random rows, set
the increment to that number. set it to cycle at the size of the
table.
4: select nextval('sequence') =>nv and use it in a select:

select * from myrandomtable where id between nv and nv+100; -- or
whatever your increment is.

There are refinements to this. The advantages, with a static data
set, are that you can cluster on the randomized id and get chunks of
the random dataset VERY quickly, and you won't repeat the results
until you start over. you can re-randomize the table every x hours or
days or weeks to meet your needs. If you don't want to re-randomize
it during the day, just put the random data set into it however many
times you need to so that it won't roll over until the next day/week
etc...

Does that make sense?

If your data changes all the time, you've got a more difficult problem
to deal with.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2007-10-25 03:04:01 Re: Using Postgres as a "embedded" / SQL Lite database on Windows
Previous Message Joe Conway 2007-10-25 02:26:16 Re: Crosstab Problems