Quick Links

Re: Selecting K random rows - efficiently!

From:	"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To:	"Patrick TJ McPhee" <ptjm(at)interlog(dot)com>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: Selecting K random rows - efficiently!
Date:	2007-10-30 08:52:21
Message-ID:	162867790710300152n276410f8l345cc401323e8be@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

2007/10/26, Patrick TJ McPhee <ptjm(at)interlog(dot)com>:
> In article <ffnid8$1q2t$1(at)news(dot)hub(dot)org>, cluster <skrald(at)amossen(dot)dk> wrote:
> % > How important is true randomness?
> %
> % The goal is an even distribution but currently I have not seen any way
> % to produce any kind of random sampling efficiently. Notice the word
>
> How about generating the ctid randomly? You can get the number of pages
> from pg_class and estimate the number of rows either using the number
> of tuples in pg_class or just based on what you know about the data.
> Then just generate two series of random numbers, one from 0 to the number
> of pages and the other from 1 to the number of rows per page, and keep
> picking rows until you have enough numbers. Assuming there aren't too
> many dead tuples and your estimates are good, this should retrieve n rows
> with roughly n look-ups.
>
> If your estimates are low, there will be tuples which can never be selected,
> and so far as I know, there's no way to construct a random ctid in a stock
> postgres database, but apart from that it seems like a good plan. If
> efficiency is important, you could create a C function which returns a
> series of random tids and join on that.
> --
>

SELECT id, ...
FROM data
WHERE id = ANY(ARRAY(
SELECT (random()*max_id)::int
FROM generate_series(1,20)))
LIMIT 1;

-- max_id is external constant

Pavel Stehule

In response to

Re: Selecting K random rows - efficiently! at 2007-10-26 04:40:35 from Patrick TJ McPhee

Browse pgsql-general by date

	From	Date	Subject
Next Message	vincent	2007-10-30 09:08:03	Re: Selecting K random rows - efficiently!
Previous Message	Hannes Dorbath	2007-10-30 08:37:07	Re: Raid Chunk Sizes for DSS type DB