Re: Selecting K random rows - efficiently!

From: cluster <skrald(at)amossen(dot)dk>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Selecting K random rows - efficiently!
Date: 2007-10-24 13:47:22
Message-ID: ffnid8$1q2t$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> How important is true randomness?

The goal is an even distribution but currently I have not seen any way
to produce any kind of random sampling efficiently. Notice the word
"efficiently". The naive way of taking a random sample of size K:
(SELECT * FROM mydata ORDER BY random() LIMIT <K>)
is clearly not an option for performance reasons. It shouldn't be
necessary to explain why. :-)

> Search the archives, there have been solutions proposed before, though
> they probably arn't very quick...

As the subject suggests, performance really matters and searching the
archives only results in poor solutions (my first post explains why).

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2007-10-24 13:52:37 Re: using libpq.lib in Microsoft C++ (managed)
Previous Message Ray Stell 2007-10-24 13:44:35 google