Skip site navigation (1) Skip section navigation (2)

Re: Query only slow on first run

From: Craig James <craig_james(at)emolecules(dot)com>
To: tmp <skrald(at)amossen(dot)dk>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Query only slow on first run
Date: 2007-11-28 14:56:33
Message-ID: 474D81A1.20301@emolecules.com (view raw or flat)
Thread:
Lists: pgsql-performance
tmp wrote:
>> what exactly is that
>> "random_number" column
> 
> A random float that is initialized when the row is created and never 
> modified afterwards. The physical row ordering will clearly not match 
> the random_number ordering. However, other queries uses a row ordering 
> by the primary key so I don't think it would make much sense to make the 
> index on random_number a clustering index just in order to speed up this 
> single query.
> 
>>  and why are you desirous of ordering by it?
> 
> In order to simulate a random pick of K rows. See [1].

A trick that I used is to sample the random column once, and create a much smaller table of the first N rows, where N is the sample size you want, and use that.

If you need a different N samples each time, you can create a temporary table, put your random N rows into that, do an ANALYZE, and then join to this smaller table.  The overall performance can be MUCH faster even though you're creating and populating a whole table, than the plan that Postgres comes up with. This seems wrong-headed (why shouldn't Postgres be able to be as efficient on its own?), but it works.

Craig


In response to

pgsql-performance by date

Next:From: Pablo AlcarazDate: 2007-11-28 14:57:14
Subject: Re: TB-sized databases
Previous:From: MatthewDate: 2007-11-28 14:40:41
Subject: Re: TB-sized databases

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group