Quick Links

Re: Selecting K random rows - efficiently!

From:	cluster <skrald(at)amossen(dot)dk>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Selecting K random rows - efficiently!
Date:	2007-10-24 13:47:22
Message-ID:	ffnid8$1q2t$1@news.hub.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

> How important is true randomness?

The goal is an even distribution but currently I have not seen any way
to produce any kind of random sampling efficiently. Notice the word
"efficiently". The naive way of taking a random sample of size K:
(SELECT * FROM mydata ORDER BY random() LIMIT <K>)
is clearly not an option for performance reasons. It shouldn't be
necessary to explain why. :-)

> Search the archives, there have been solutions proposed before, though
> they probably arn't very quick...

As the subject suggests, performance really matters and searching the
archives only results in poor solutions (my first post explains why).

In response to

Re: Selecting K random rows - efficiently! at 2007-10-24 13:08:11 from Martijn van Oosterhout

Responses

Re: Selecting K random rows - efficiently! at 2007-10-26 04:40:35 from Patrick TJ McPhee

Browse pgsql-general by date

	From	Date	Subject
Next Message	Magnus Hagander	2007-10-24 13:52:37	Re: using libpq.lib in Microsoft C++ (managed)
Previous Message	Ray Stell	2007-10-24 13:44:35	google