Re: CPU usage goes to 100%, query seems to ran forever

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrus" <eetasoft(at)online(dot)ee>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: CPU usage goes to 100%, query seems to ran forever
Date: 2006-04-28 14:57:00
Message-ID: 900.1146236220@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

"Andrus" <eetasoft(at)online(dot)ee> writes:
> Here it is running in my local computer. I'm expecting run time no more 1
> second

Something seems to have truncated your EXPLAIN output, but anyway we
can see where the problem is:

> " -> Seq Scan on dok (cost=0.00..787.80 rows=1 width=39)
> (actual time=0.152..878.198 rows=7670 loops=1)"
> " Filter: ((kuupaev >= '2006-04-08'::date) AND (kuupaev
> <= '2006-04-27'::date) AND (NOT (eimuuda)::boolean) AND ((laonr)::double
> precision = 1::double precision) AND ("position"('OSIDVGYKIF'::text,
> (doktyyp)::text) <> 0) AND (((kuupaev):: (..)"

The planner is expecting to get one row from "dok" passing the filter
condition, and hence chooses a plan that is suitable for a small number
of rows ... but in reality there are 7670 rows matching the filter
condition, and that's what blows the runtime out of the water. (Most of
the runtime is actually going into 7670 repeated scans of "rid", which
wouldn't have happened with another plan type.)

So you need to see about getting that estimate to be more accurate.
First thing is to make sure that "dok" has been ANALYZEd --- just do it
by hand. If that doesn't change the EXPLAIN plan then you need to work
harder. I can see at least three things you are doing that are
unnecessarily destroying the planner's ability to estimate the number of
matching rows:

dok.laonr='1'::float8 and

Since laonr apparently isn't float8, this forces a runtime type
conversion as well as interfering with statistics use. (The planner
will have ANALYZE stats about dok.laonr, but the connection to
dok.laonr::float8 escapes it.) Just write the constant with quotes
and no type coercion.

POSITION( dok.doktyyp IN 'OSIDVGYKIF')!=0 AND

This is completely unestimatable given the available statistics, and it
doesn't look to me like it is all that great a semantic representation
either. Perhaps the query that's really meant here is "dok.doktypp IN
('O','S','I', ...)"? If so, you should say what you mean, not play
games with converting the query into some strange string operation.

AND dok.kuupaev||dok.kellaaeg BETWEEN '2006-04-08' AND '2006-04-2723 59'

This is another case where the planner is not going to have any ability
to make a useful estimate, and it's because you are using a crummy
representation of your data. You should merge those two columns into
one timestamp column and just do a simple BETWEEN test.

By and large, unnatural representations of data that you use in WHERE
clauses are going to cost you big-time in SQL queries. It's worth
taking time up front to design a clean table schema, and taking time
to revise it when requirements change.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2006-04-28 14:58:43 Re: Why so slow?
Previous Message Markus Schaber 2006-04-28 14:46:27 Arrays and index scan