From: | David Jarvis <thangalin(at)gmail(dot)com> |
---|---|
To: | Rob Wultsch <wultsch(at)gmail(dot)com> |
Cc: | Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Random Page Cost and Planner |
Date: | 2010-05-26 06:13:45 |
Message-ID: | AANLkTinexztvMHO-IKfEtM2ni371vgX51sV5sRcHMWG0@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi, Rob.
I tried bumping the effective_cache_size. It made no difference.
My latest attempt at forcing PostgreSQL to use the indexes involved two
loops: one to loop over the stations, the other to extract the station data
from the measurement table. The outer loop executes in 1.5 seconds. The
inner loop does a full table scan for each record in the outer loop:
FOR station IN
SELECT
sc.station_id,
sc.taken_start,
sc.taken_end
FROM
climate.city c,
climate.station s,
climate.station_category sc
WHERE
c.id = city_id AND
earth_distance(
ll_to_earth(c.latitude_decimal,c.longitude_decimal),
ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <=
radius AND
s.elevation BETWEEN elevation1 AND elevation2 AND
s.applicable AND
sc.station_id = s.id AND
sc.category_id = category_id AND
extract(YEAR FROM sc.taken_start) >= year1 AND
extract(YEAR FROM sc.taken_end) <= year2
ORDER BY
sc.station_id
LOOP
RAISE NOTICE 'B.1. % % %', station.station_id, station.taken_start,
station.taken_end;
FOR measure IN
SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) AS amount
FROM
climate.measurement m
WHERE
* m.station_id = station.station_id AND
m.taken BETWEEN station.taken_start AND station.taken_end AND
m.category_id = category_id
* GROUP BY
extract(YEAR FROM m.taken)
LOOP
RAISE NOTICE ' B.2. % %', measure.year, measure.amount;
END LOOP;
END LOOP;
I thought that the bold lines would have evoked index use. The values used
for the inner query:
NOTICE: B.1. 754 1980-08-01 2001-11-30
When I run the query manually, using constants, it executes in ~25
milliseconds:
SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) AS amount
FROM
climate.measurement m
WHERE
m.station_id = 754 AND
m.taken BETWEEN '1980-08-01'::date AND '2001-11-30'::date AND
m.category_id = 7
GROUP BY
extract(YEAR FROM m.taken)
With 106 rows it should execute in ~2.65 seconds, which is better than the 5
seconds I get when everything is cached and a tremendous improvement over
the ~85 seconds from cold.
I do not understand why the below query uses a full table scan (executes in
~13 seconds):
SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) AS amount
FROM
climate.measurement m
WHERE
* m.station_id = station.station_id AND*
* m.taken BETWEEN station.taken_start AND station.taken_end AND*
* m.category_id = category_id*
GROUP BY
extract(YEAR FROM m.taken)
Moreover, what can I do to solve the problem?
Thanks again!
Dave
From | Date | Subject | |
---|---|---|---|
Next Message | Alexey Klyukin | 2010-05-26 09:00:18 | Re: Random Page Cost and Planner |
Previous Message | Robert Haas | 2010-05-26 04:25:56 | Re: Performance issues when the number of records are around 10 Million |