application of KNN code to US zipcode searches?

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: application of KNN code to US zipcode searches?
Date: 2011-02-17 14:40:56
Message-ID: ijjc1o$vgu$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


We perform over 1,000,000 searches each day for "adoptable shelter pets
near your zipcode". We already have adequate performance for these
searches using the "cube" contrib, but the new KNN work in 9.1 seemed
like it might be a promising way to speed this up even further.

I installed PostgreSQL 9.1 on my laptop to evaluate it, using this post
as a reference:
http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/

The first task was to translate a geo-spatial search to use the new KNN
syntax.

I'm most familiar with two approaches to geo-spatial searching with
PostgreSQL. The first is the older "earthdistance" approach, using
"point" types and the "<@>" operator.

The second is the one we are using now, which uses a cube type, the
"cube_distance()" and "earth_box()" method and a GIST index on the cube
type.

Immediately there is a hurdle in that KNN only appears to work with
point types and the <-> operator, which does simple point-to-point
distance, instead of the distance-around-the-earth. Still, I thought
that could be enough of an approximation to test the waters.

I started with some "real world" queries that involved some table joins,
and when those failed to show improvement, I worked with some
reduced-test-case queries.

While I could confirm the new GIST index was being used on the point
type, I couldn't get a query to benchmark better when it was invoked.
I'm wondering if perhaps US zipcode searches aren't good use of this
technology, perhaps because the data set is too small ( About 40,000
zipcodes ).

Given that we can already do GIST-indexed searches with the cube type
that provide good reasonable approximations for zipcode-radius searches,
are others planning to eventually apply the KNN work to US zipcode
searches?

Sample EXPLAIN output and query times are below.

Mark

EXPLAIN ANALYZE SELECT zipcode,
lon_lat <-> '(-118.412426,34.096629)' AS radius
FROM zipcodes ;
-------------------------------------------
Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483 width=22) (actual
time=0.019..84.543 rows=41483 loops=1)
Total runtime: 148.129 ms

EXPLAIN ANALYZE SELECT zipcode,
lon_lat <-> '(-118.412426,34.096629)' As radius
FROM zipcodes
ORDER BY lon_lat <-> '(-118.412426,34.096629)';
--------------------------------------------------
Index Scan using zipcodes_knn on zipcodes (cost=0.00..5365.93
rows=41483 width=22) (actual time=0.451..141.590 rows=41483 loops=1)
Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
Total runtime: 206.392 ms

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2011-02-17 14:49:28 Re: application of KNN code to US zipcode searches?
Previous Message Pierre C 2011-02-17 07:30:25 Re: high user cpu, massive SELECTs, no io waiting problem