Re: Optimize date query for large child tables: GiST or GIN?

From: David Jarvis <thangalin(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thombrown(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-21 02:02:34
Message-ID: AANLkTimpwoKQw2sXOl4n2lNOzZz_YpBYLKIkOhVgaTD4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I took out the date conditions:

SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043

This uses the station indexes:

"Result (cost=0.00..21781.18 rows=8090 width=28)"
" -> Append (cost=0.00..21781.18 rows=8090 width=28)"
" -> Seq Scan on measurement m (cost=0.00..28.00 rows=1 width=38)"
" Filter: ((category_id = 1) AND (station_id = 2043))"
" -> Bitmap Heap Scan on measurement_01_001 m (cost=11.79..1815.67
rows=677 width=28)"
" Recheck Cond: (station_id = 2043)"
" Filter: (category_id = 1)"
" -> Bitmap Index Scan on measurement_01_001_s_idx
(cost=0.00..11.62 rows=677 width=0)"
" Index Cond: (station_id = 2043)"
" -> Bitmap Heap Scan on measurement_02_001 m (cost=14.47..1682.18
rows=627 width=28)"
" Recheck Cond: (station_id = 2043)"
" Filter: (category_id = 1)"
" -> Bitmap Index Scan on measurement_02_001_s_idx
(cost=0.00..14.32 rows=627 width=0)"
" Index Cond: (station_id = 2043)"

2500+ rows in 185 milliseconds.

That is pretty good (I'll need it to be better but for now it works).

Then combined the selection of the station:

SELECT
m.*
FROM
climate.measurement m,
(SELECT
s.id
FROM
climate.station s,
climate.city c
WHERE
c.id = 5182 AND
s.elevation BETWEEN 0 AND 3000 AND
6371.009 * SQRT(
POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
(COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
) <= 25
) t
WHERE
m.category_id = 1 and
m.station_id = t.id

The station index is no longer used, resulting in full table scans:

"Hash Join (cost=1045.52..1341150.09 rows=14556695 width=28)"
" Hash Cond: (m.station_id = s.id)"
" -> Append (cost=0.00..867011.99 rows=43670085 width=28)"
" -> Seq Scan on measurement m (cost=0.00..25.00 rows=6 width=38)"
" Filter: (category_id = 1)"
" -> Seq Scan on measurement_01_001 m (cost=0.00..71086.96
rows=3580637 width=28)"
" Filter: (category_id = 1)"
" -> Seq Scan on measurement_02_001 m (cost=0.00..64877.40
rows=3267872 width=28)"
" Filter: (category_id = 1)"
" -> Seq Scan on measurement_03_001 m (cost=0.00..71131.44
rows=3582915 width=28)"
" Filter: (category_id = 1)"

How do I avoid the FTS?

(I know about PostGIS but I can only learn and do so much at once.) ;-)

Here's the station query:

SELECT
s.id
FROM
climate.station s,
climate.city c
WHERE
c.id = 5182 AND
s.elevation BETWEEN 0 AND 3000 AND
6371.009 * SQRT(
POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
(COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
) <= 25

And its EXPLAIN:

"Nested Loop (cost=0.00..994.94 rows=4046 width=4)"
" Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double
precision), 2::double precision) + (cos((radians(((c.latitude_decimal +
s.latitude_decimal))::double precision) / 2::double precision)) *
pow(radians(((c.longitude_decimal - s.longitude_decimal))::double
precision), 2::double precision))))) <= 25::double precision)"
" -> Index Scan using city_pkey1 on city c (cost=0.00..6.27 rows=1
width=16)"
" Index Cond: (id = 5182)"
" -> Seq Scan on station s (cost=0.00..321.08 rows=12138 width=20)"
" Filter: ((s.elevation >= 0) AND (s.elevation <= 3000))"

I get a set of 78 rows returned in very little time.

Thanks again!
Dave

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Wakeling 2010-05-21 03:27:54 Re: Optimize date query for large child tables: GiST or GIN?
Previous Message David Jarvis 2010-05-21 00:46:43 Re: Optimize date query for large child tables: GiST or GIN?