Skip site navigation (1) Skip section navigation (2)

Re: Optimize date query for large child tables: GiST or GIN?

From: David Jarvis <thangalin(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thombrown(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-21 02:02:34
Message-ID: AANLkTimpwoKQw2sXOl4n2lNOzZz_YpBYLKIkOhVgaTD4@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
I took out the date conditions:

SELECT
  m.*
FROM
  climate.measurement m
WHERE
  m.category_id = 1 and
  m.station_id = 2043

This uses the station indexes:

"Result  (cost=0.00..21781.18 rows=8090 width=28)"
"  ->  Append  (cost=0.00..21781.18 rows=8090 width=28)"
"        ->  Seq Scan on measurement m  (cost=0.00..28.00 rows=1 width=38)"
"              Filter: ((category_id = 1) AND (station_id = 2043))"
"        ->  Bitmap Heap Scan on measurement_01_001 m  (cost=11.79..1815.67
rows=677 width=28)"
"              Recheck Cond: (station_id = 2043)"
"              Filter: (category_id = 1)"
"              ->  Bitmap Index Scan on measurement_01_001_s_idx
(cost=0.00..11.62 rows=677 width=0)"
"                    Index Cond: (station_id = 2043)"
"        ->  Bitmap Heap Scan on measurement_02_001 m  (cost=14.47..1682.18
rows=627 width=28)"
"              Recheck Cond: (station_id = 2043)"
"              Filter: (category_id = 1)"
"              ->  Bitmap Index Scan on measurement_02_001_s_idx
(cost=0.00..14.32 rows=627 width=0)"
"                    Index Cond: (station_id = 2043)"

2500+ rows in 185 milliseconds.

That is pretty good (I'll need it to be better but for now it works).

Then combined the selection of the station:

SELECT
  m.*
FROM
  climate.measurement m,
  (SELECT
     s.id
   FROM
     climate.station s,
     climate.city c
   WHERE
     c.id = 5182 AND
     s.elevation BETWEEN 0 AND 3000 AND
     6371.009 * SQRT(
       POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
       (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
        POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
     ) <= 25
   ) t
WHERE
  m.category_id = 1 and
  m.station_id = t.id

The station index is no longer used, resulting in full table scans:

"Hash Join  (cost=1045.52..1341150.09 rows=14556695 width=28)"
"  Hash Cond: (m.station_id = s.id)"
"  ->  Append  (cost=0.00..867011.99 rows=43670085 width=28)"
"        ->  Seq Scan on measurement m  (cost=0.00..25.00 rows=6 width=38)"
"              Filter: (category_id = 1)"
"        ->  Seq Scan on measurement_01_001 m  (cost=0.00..71086.96
rows=3580637 width=28)"
"              Filter: (category_id = 1)"
"        ->  Seq Scan on measurement_02_001 m  (cost=0.00..64877.40
rows=3267872 width=28)"
"              Filter: (category_id = 1)"
"        ->  Seq Scan on measurement_03_001 m  (cost=0.00..71131.44
rows=3582915 width=28)"
"              Filter: (category_id = 1)"

How do I avoid the FTS?

(I know about PostGIS but I can only learn and do so much at once.) ;-)

Here's the station query:

SELECT
  s.id
FROM
  climate.station s,
  climate.city c
WHERE
  c.id = 5182 AND
  s.elevation BETWEEN 0 AND 3000 AND
  6371.009 * SQRT(
    POW(RADIANS(c.latitude_decimal - s.latitude_decimal), 2) +
    (COS(RADIANS(c.latitude_decimal + s.latitude_decimal) / 2) *
    POW(RADIANS(c.longitude_decimal - s.longitude_decimal), 2))
  ) <= 25

And its EXPLAIN:

"Nested Loop  (cost=0.00..994.94 rows=4046 width=4)"
"  Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double
precision), 2::double precision) + (cos((radians(((c.latitude_decimal +
s.latitude_decimal))::double precision) / 2::double precision)) *
pow(radians(((c.longitude_decimal - s.longitude_decimal))::double
precision), 2::double precision))))) <= 25::double precision)"
"  ->  Index Scan using city_pkey1 on city c  (cost=0.00..6.27 rows=1
width=16)"
"        Index Cond: (id = 5182)"
"  ->  Seq Scan on station s  (cost=0.00..321.08 rows=12138 width=20)"
"        Filter: ((s.elevation >= 0) AND (s.elevation <= 3000))"

I get a set of 78 rows returned in very little time.

Thanks again!
Dave

In response to

Responses

pgsql-performance by date

Next:From: Matthew WakelingDate: 2010-05-21 03:27:54
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Previous:From: David JarvisDate: 2010-05-21 00:46:43
Subject: Re: Optimize date query for large child tables: GiST or GIN?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group