Re: intermittant performance problem

From: marcin mank <marcin(dot)mank(at)gmail(dot)com>
To: Mike Charnoky <noky(at)nextbus(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: intermittant performance problem
Date: 2009-03-29 08:24:48
Message-ID: b1b9fac60903290124r31dc8342s603125bd00d535cf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I think (a part of) Your problem is that order by random() is O(N
logN) complexity, while You are after O(N) .

The solution (in pseudocode)

random_sample(resultset,K):
result := first K rows from resultset
resultset.scrollto(K+1)
p = K+1
while(resultset.hasMoreRows())
row = resultset.current()
resultset.advance()
if(random() < K/p)
replace a random element in result with row
p = p+1
return result

the invariant being that at each step result contains a random sample
of K elements.

It should be fairly easy to implement in plpgsql.

Greetings
Marcin

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message marcin mank 2009-03-29 08:51:35 Re: intermittant performance problem
Previous Message ray 2009-03-29 04:43:11 Re: Authentication Failed - new user installation