Skip site navigation (1) Skip section navigation (2)

application of KNN code to US zipcode searches?

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: application of KNN code to US zipcode searches?
Date: 2011-02-17 14:40:56
Message-ID: ijjc1o$vgu$ (view raw or flat)
Lists: pgsql-performance
We perform over 1,000,000 searches each day for "adoptable shelter pets
near your zipcode". We already have adequate performance for these
searches using the "cube" contrib, but the new KNN work in 9.1 seemed
like it might be a promising way to speed this up even further.

I installed PostgreSQL 9.1 on my laptop to evaluate it, using this post
as a reference:

The first task was to translate a geo-spatial search to use the new KNN

I'm most familiar with two approaches to geo-spatial searching with
PostgreSQL. The first is the older "earthdistance" approach, using
"point" types and the "<@>" operator.

The second is the one we are using now, which uses a cube type, the
"cube_distance()" and "earth_box()" method and a GIST index on the cube

Immediately there is a hurdle in that KNN only appears to work with
point types and the <-> operator, which does simple point-to-point
distance, instead of the distance-around-the-earth. Still, I thought
that could be enough of an approximation to test the waters.

I started with some "real world" queries that involved some table joins,
and when those failed to show improvement, I worked with some
reduced-test-case queries.

While I could confirm the new GIST index was being used on the point
type, I couldn't get a query to benchmark better when it was invoked.
I'm wondering if perhaps US zipcode searches aren't good use of this
technology, perhaps because the data set is too small ( About 40,000
zipcodes ).

Given that we can already do GIST-indexed searches with the cube type
that provide good reasonable approximations for zipcode-radius searches,
are others planning to eventually apply the KNN work to US zipcode

Sample EXPLAIN output and query times are below.


    lon_lat <-> '(-118.412426,34.096629)' AS radius
    FROM zipcodes ;
 Seq Scan on zipcodes  (cost=0.00..1257.54 rows=41483 width=22) (actual
time=0.019..84.543 rows=41483 loops=1)
 Total runtime: 148.129 ms

    lon_lat <-> '(-118.412426,34.096629)' As radius
    FROM zipcodes
    ORDER BY lon_lat <-> '(-118.412426,34.096629)';
 Index Scan using zipcodes_knn on zipcodes  (cost=0.00..5365.93
rows=41483 width=22) (actual time=0.451..141.590 rows=41483 loops=1)
   Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
 Total runtime: 206.392 ms


pgsql-performance by date

Next:From: Kevin GrittnerDate: 2011-02-17 14:49:28
Subject: Re: application of KNN code to US zipcode searches?
Previous:From: Pierre CDate: 2011-02-17 07:30:25
Subject: Re: high user cpu, massive SELECTs, no io waiting problem

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group