Skip site navigation (1) Skip section navigation (2)

Re: Random Page Cost and Planner

From: David Jarvis <thangalin(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Random Page Cost and Planner
Date: 2010-05-26 03:50:09
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Hi, Tom.

Yes, that is what happened, making the tests rather meaningless, and giving
me the false impression that the indexes were being used. They were but only
because of cached results. When multiple users making different queries, the
performance will return to ~80s per query.

I also tried Kevin's suggestion, which had no noticeable effect:
effective_cache_size = 512MB

That said, when using the following condition, the query is fast (1 second):

    extract(YEAR FROM sc.taken_start) >= 1963 AND
    extract(YEAR FROM sc.taken_end) <= 2009 AND

"                    ->  Index Scan using measurement_013_stc_idx on
measurement_013 m  (cost=0.00..511.00 rows=511 width=15) (actual
time=0.018..3.601 rows=3356 loops=104)"
"                          Index Cond: ((m.station_id = sc.station_id) AND
(m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id
= 7))"

This condition makes it slow (13 seconds on first run, 8 seconds

*    extract(YEAR FROM sc.taken_start) >= 1900 AND
*    extract(YEAR FROM sc.taken_end) <= 2009 AND

"                          Filter: (category_id = 7)"
"                    ->  Seq Scan on measurement_013 m
(cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692
rows=18118395 loops=1)"

At this point, I'm tempted to write a stored procedure that iterates over
each station category for all the years of each station. My guess is that
the planner's estimate for the number of rows that will be returned by
FROM sc.taken_start) >= 1900* is incorrect and so it chooses a full table
scan for all rows. Even though the lower bound appears to be a constant
value of the 1900, the average year a station started collecting data was 44
years ago (1965), and did so for an average of 21.4 years.

The part I am having trouble with is convincing PG to use the index for the
station ID and the date range for when the station was active. Each station
has a unique ID; the data in the measurement table is ordered by measurement
date then by station.

Should I add a clustered index by station then by date?

Any other suggestions are very much appreciated.


In response to


pgsql-performance by date

Next:From: Robert HaasDate: 2010-05-26 04:13:42
Subject: Re: Function scan/Index scan to nested loop
Previous:From: Rob WultschDate: 2010-05-26 00:56:58
Subject: Re: Random Page Cost and Planner

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group