Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

From: Richard Huxton <dev(at)archonet(dot)com>
To: Tory M Blue <tmblue(at)gmail(dot)com>
Cc: Matthew Wakeling <matthew(at)flymine(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Date: 2010-01-25 10:54:30
Message-ID: 4B5D7866.5090900@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 22/01/10 19:06, Tory M Blue wrote:

> Here is the explain plan for the query. Actual rows that the query
> returns is 6369

Actually, it processes 19,799 rows (see the actual rows= below).

> SLOW

> " -> Bitmap Heap Scan on userstats (cost=797.69..118850.46
> rows=13399 width=8) (actual time=281.604..31190.290 rows=19799
> loops=1)"

> "Total runtime: 31219.536 ms"

> FAST

> " -> Bitmap Heap Scan on userstats a (cost=802.66..118855.43
> rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1)"

> "Total runtime: 3813.626 ms"

OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms
per row)

The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).

You are asking for DISTINCT user-ids, so it's seems reasonable that it
will take slightly longer to check a larger set of user-ids.

Otherwise, both queries are the same. I'm still a little puzzled by the
bitmap scan, but the planner probably knows more about your data than I do.

The main time is spent in the "bitmap heap scan" which is where it's
grabbing actual row data (and presumably building a hash over the uid
column). you can see how long in the "actual time" the first number
(e.g. 281.604) is the time spent before it starts, and the second is the
total time at finish (31190.290). If "loops" was greater than 1 you
would multiply the times by the number of loops to get a total.

So - there's nothing "wrong" in the sense that the second query does the
same as the first. Let's take a step back. What you really want is your
reports to be faster.

You mentioned you were running this query thousands of times with a
different "makeid" each time. Running it once for all possible values
and stashing the results in a temp table will probably be *much* faster.
The planner can just scan the whole table once and build up its results
as it goes.

--
Richard Huxton
Archonet Ltd

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2010-01-25 11:59:41 Re: Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Previous Message A. Kretschmer 2010-01-25 10:13:27 Re: Sql result b where condition