Skip site navigation (1) Skip section navigation (2)

Re: Query only slow on first run

From: tmp <skrald(at)amossen(dot)dk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query only slow on first run
Date: 2007-11-28 01:08:40
Message-ID: 474CBF98.6070508@amossen.dk (view raw or flat)
Thread:
Lists: pgsql-performance
> The query's spending nearly all its time in the scan of "posts", and
> I'm wondering why --- doesn't seem like it should take 6400msec to fetch
> 646 rows, unless perhaps the data is just horribly misordered relative
> to the index.   Which may in fact be the case ...

Yes, they probably are. I use the random_number column in order to 
receive a semi random sample subset from the large amount of rows. The 
technique is described in [1]. This subset is later used for some 
statistical investigation, but this is somewhat irrelevant here. In 
order to receive the sample fast, I have made an index on the 
random_number column.

> what exactly is that
> "random_number" column

A random float that is initialized when the row is created and never 
modified afterwards. The physical row ordering will clearly not match 
the random_number ordering. However, other queries uses a row ordering 
by the primary key so I don't think it would make much sense to make the 
index on random_number a clustering index just in order to speed up this 
single query.

>  and why are you desirous of ordering by it?

In order to simulate a random pick of K rows. See [1].

> For that matter, if it is what it sounds like, why is it sane to group
> by it?  You'll probably always get groups of one row ...

For each random_number, another table (question_tags) holds zero or more 
rows satisfying a number of constraints. I need to count(*) the number 
of corresponding question_tag rows for each random_number.

We have primarily two tables of interest here: questions (~100k rows) 
and posts (~400k rows). Each post refers to a question, but only the 
"posts" rows for which the corresponding "question.status = 1" are 
relevant. This reduces the number of relevant question rows to about 
10k. Within the post rows corresponding to these 10k questions I would 
like to pick a random sample of size K.

[1] http://archives.postgresql.org/pgsql-general/2007-10/msg01240.php


In response to

Responses

pgsql-performance by date

Next:From: Shadkam IslamDate: 2007-11-28 09:30:55
Subject: Windows XP selects are very slow
Previous:From: Steinar H. GundersonDate: 2007-11-28 00:28:18
Subject: Re: Query only slow on first run

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group