Yes, that is what happened, making the tests rather meaningless, and giving
me the false impression that the indexes were being used. They were but only
because of cached results. When multiple users making different queries, the
performance will return to ~80s per query.
I also tried Kevin's suggestion, which had no noticeable effect:
effective_cache_size = 512MB
That said, when using the following condition, the query is fast (1 second):
extract(YEAR FROM sc.taken_start) >= 1963 AND
extract(YEAR FROM sc.taken_end) <= 2009 AND
" -> Index Scan using measurement_013_stc_idx on
measurement_013 m (cost=0.00..511.00 rows=511 width=15) (actual
time=0.018..3.601 rows=3356 loops=104)"
" Index Cond: ((m.station_id = sc.station_id) AND
(m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id
This condition makes it slow (13 seconds on first run, 8 seconds
* extract(YEAR FROM sc.taken_start) >= 1900 AND
* extract(YEAR FROM sc.taken_end) <= 2009 AND
" Filter: (category_id = 7)"
" -> Seq Scan on measurement_013 m
(cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692
At this point, I'm tempted to write a stored procedure that iterates over
each station category for all the years of each station. My guess is that
the planner's estimate for the number of rows that will be returned by
FROM sc.taken_start) >= 1900* is incorrect and so it chooses a full table
scan for all rows. Even though the lower bound appears to be a constant
value of the 1900, the average year a station started collecting data was 44
years ago (1965), and did so for an average of 21.4 years.
The part I am having trouble with is convincing PG to use the index for the
station ID and the date range for when the station was active. Each station
has a unique ID; the data in the measurement table is ordered by measurement
date then by station.
Should I add a clustered index by station then by date?
Any other suggestions are very much appreciated.
In response to
pgsql-performance by date
|Next:||From: Robert Haas||Date: 2010-05-26 04:13:42|
|Subject: Re: Function scan/Index scan to nested loop|
|Previous:||From: Rob Wultsch||Date: 2010-05-26 00:56:58|
|Subject: Re: Random Page Cost and Planner|