Re: Optimize date query for large child tables: GiST or GIN?

From: David Jarvis <thangalin(at)gmail(dot)com>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thombrown(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-21 04:11:52
Message-ID: AANLkTimjPLmnSZDe3nNBVGjJ0CGIQd_Ic8IxHb4fUG9v@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,

(An EXPLAIN ANALYSE would be better here). Look at the expected number of
> stations

"Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual
time=0.053..41.173 rows=78 loops=1)"
" Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double
precision), 2::double precision) + (cos((radians(((c.latitude_decimal +
s.latitude_decimal))::double precision) / 2::double precision)) *
pow(radians(((c.longitude_decimal - s.longitude_decimal))::double
precision), 2::double precision))))) <= 25::double precision)"
" -> Index Scan using city_pkey1 on city c (cost=0.00..6.27 rows=1
width=16) (actual time=0.014..0.016 rows=1 loops=1)"
" Index Cond: (id = 5182)"
" -> Seq Scan on station s (cost=0.00..321.08 rows=12138 width=20)
(actual time=0.007..5.256 rows=12139 loops=1)"
" Filter: ((s.elevation >= 0) AND (s.elevation <= 3000))"
"Total runtime: 41.235 ms"

expects to have to touch a large proportion of the measurement table,
> therefore it thinks that it will be fastest to do a seq scan. In actual
> fact, for 78 stations, the index would be faster, but for 4046 it wouldn't.
>

This is rather unexpected. I'd have figured it would use the actual number.

> If you will be querying by season quite regularly, had you considered
> partitioning by season?
>

I have no idea what the "regular" queries will be. The purpose of the system
is to open the data up to the public using a simple user interface so that
they can generate their own custom reports. That user interface allows
people to pick year intervals, day ranges, elevations, categories
(temperature, precipitation, snow depth, etc.), and lat/long perimeter
coordinates (encompassing any number of stations) or a city and radius.

Dave

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Yeb Havinga 2010-05-21 08:38:43 Re: Optimize date query for large child tables: GiST or GIN?
Previous Message Matthew Wakeling 2010-05-21 03:27:54 Re: Optimize date query for large child tables: GiST or GIN?