Skip site navigation (1) Skip section navigation (2)

Re: Optimize date query for large child tables: GiST or GIN?

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: David Jarvis <thangalin(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thombrown(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Date: 2010-05-21 03:27:54
Message-ID: alpine.DEB.2.00.1005202322470.23895@aragorn.flymine.org (view raw or flat)
Thread:
Lists: pgsql-performance
On Thu, 20 May 2010, David Jarvis wrote:
> I took out the date conditions:
>
> SELECT
>  m.*
> FROM
>  climate.measurement m
> WHERE
>  m.category_id = 1 and
>  m.station_id = 2043
>
> This uses the station indexes:

Yes, because there is only one station_id selected. That's exactly what an 
index is for.

> Then combined the selection of the station:
> The station index is no longer used, resulting in full table scans:

> "Nested Loop  (cost=0.00..994.94 rows=4046 width=4)"
> "  Join Filter: ((6371.009::double precision *
> sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::double
> precision), 2::double precision) + (cos((radians(((c.latitude_decimal +
> s.latitude_decimal))::double precision) / 2::double precision)) *
> pow(radians(((c.longitude_decimal - s.longitude_decimal))::double
> precision), 2::double precision))))) <= 25::double precision)"
> "  ->  Index Scan using city_pkey1 on city c  (cost=0.00..6.27 rows=1
> width=16)"
> "        Index Cond: (id = 5182)"
> "  ->  Seq Scan on station s  (cost=0.00..321.08 rows=12138 width=20)"
> "        Filter: ((s.elevation >= 0) AND (s.elevation <= 3000))"
>
> I get a set of 78 rows returned in very little time.

(An EXPLAIN ANALYSE would be better here). Look at the expected number of 
stations returned. It expects 4046 which is a large proportion of the 
available stations. It therefore expects to have to touch a large 
proportion of the measurement table, therefore it thinks that it will be 
fastest to do a seq scan. In actual fact, for 78 stations, the index would 
be faster, but for 4046 it wouldn't.

If you will be querying by season quite regularly, had you considered 
partitioning by season?

Matthew

-- 
 Geography is going places.

In response to

Responses

pgsql-performance by date

Next:From: David JarvisDate: 2010-05-21 04:11:52
Subject: Re: Optimize date query for large child tables: GiST or GIN?
Previous:From: David JarvisDate: 2010-05-21 02:02:34
Subject: Re: Optimize date query for large child tables: GiST or GIN?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group