Re: application of KNN code to US zipcode searches?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Mark Stosberg <mark(at)summersault(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: application of KNN code to US zipcode searches?
Date: 2011-02-17 20:17:52
Message-ID: Pine.LNX.4.64.1102172316360.278@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mark,

we investigating pgsphere http://pgsphere.projects.postgresql.org/, if we could add KNN support.

Oleg
On Thu, 17 Feb 2011, Mark Stosberg wrote:

>
>> I thought the benefit of KNN was that you could retrieve the rows in
>> distance order, so that a query for the closest 20 locations (for
>> example) would be very fast. I wouldn't have expected it to be
>> helpful when you're selecting all the rows regardless of distance.
>
> Kevin,
>
> Thanks for the feedback. You are right that my "reduced test case"
> wasn't a good approximation. I added a limit, to simulate finding the
> 100 zipcodes closest to 90210.
>
> Below I compare 4 approaches to the same query:
>
> 1. Cube search
> 2. Earth Distance Search
> 3. Simple point distance (no index)
> 4. Simple point distance (KNN)
>
> Now KNN benchmarks to be almost 100x faster! That's very promising.
> Then there's only the issue that simple point distance is not expected
> to be a good enough approximation of earth-distances. Perhaps that can
> be solved by pre-computing coordinates based on the lat/long pairs....
> much like the map projections used to present a curved surface on a flat
> map? Given that's OK to be be a few miles off, it seems we have some
> leeway here.
>
> Recommendations?
>
> Mark
>
> EXPLAIN ANALYZE
> SELECT zipcode,
> cube_distance( '(-2513120.64361786, -4645511.0460328,
> 3575538.9507084)', zipcodes.earth_coords)/1609.344 AS radius
> FROM zipcodes ORDER BY radius LIMIT 100;
>
> ---------------------------------------------------------------
> Limit (cost=2946.70..2946.95 rows=100 width=62) (actual
> time=167.650..168.064 rows=100 loops=1)
> -> Sort (cost=2946.70..3050.40 rows=41483 width=62) (actual
> time=167.644..167.829 rows=100 loops=1)
> Sort Key: ((cube_distance('(-2513120.64361786,
> -4645511.0460328, 3575538.9507084)'::cube, earth_coords) /
> 1609.344::double precision))
> Sort Method: top-N heapsort Memory: 20kB
> -> Seq Scan on zipcodes (cost=0.00..1361.24 rows=41483
> width=62) (actual time=0.030..90.807 rows=41483 loops=1)
> Total runtime: 168.300 ms
>
> ############################################################3
>
> -- Using Earthdistance
> EXPLAIN ANALYZE SELECT zipcode,
> lon_lat <@> '(-118.412426,34.096629)' As radius
> FROM zipcodes
> ORDER BY lon_lat <@> '(-118.412426,34.096629)'
> LIMIT 100;
>
> ------------------------------------------------------------
> Limit (cost=2842.99..2843.24 rows=100 width=22) (actual
> time=187.995..188.451 rows=100 loops=1)
> -> Sort (cost=2842.99..2946.70 rows=41483 width=22) (actual
> time=187.989..188.149 rows=100 loops=1)
> Sort Key: ((lon_lat <@> '(-118.412426,34.096629)'::point))
> Sort Method: top-N heapsort Memory: 20kB
> -> Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483
> width=22) (actual time=0.033..108.203 rows=41483 loops=1)
> Total runtime: 188.660 ms
>
> ##########################################
>
> Using simple point distance, but with no Gist Index:
>
> EXPLAIN ANALYZE SELECT zipcode,
> lon_lat <-> '(-118.412426,34.096629)' As radius
> FROM zipcodes
> ORDER BY lon_lat <-> '(-118.412426,34.096629)'
> LIMIT 100;
>
> --------------------------------------------------------
> Limit (cost=2842.99..2843.24 rows=100 width=22) (actual
> time=160.574..161.057 rows=100 loops=1)
> -> Sort (cost=2842.99..2946.70 rows=41483 width=22) (actual
> time=160.568..160.691 rows=100 loops=1)
> Sort Key: ((lon_lat <-> '(-118.412426,34.096629)'::point))
> Sort Method: top-N heapsort Memory: 20kB
> -> Seq Scan on zipcodes (cost=0.00..1257.54 rows=41483
> width=22) (actual time=0.027..84.610 rows=41483 loops=1)
> Total runtime: 161.226 ms
>
> #########################################
>
> -- Using KNN-GIST index
> EXPLAIN ANALYZE SELECT zipcode,
> lon_lat <-> '(-118.412426,34.096629)' As radius
> FROM zipcodes
> ORDER BY lon_lat <-> '(-118.412426,34.096629)'
> LIMIT 100;
> ------------------------------------------------------------------
> Limit (cost=0.00..12.94 rows=100 width=22) (actual time=0.447..1.892
> rows=100 loops=1)
> -> Index Scan using zipcodes_knn on zipcodes (cost=0.00..5365.93
> rows=41483 width=22) (actual time=0.440..1.407 rows=100 loops=1)
> Order By: (lon_lat <-> '(-118.412426,34.096629)'::point)
> Total runtime: 2.198 ms
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Stosberg 2011-02-17 20:38:30 Re: application of KNN code to US zipcode searches?
Previous Message Tom Lane 2011-02-17 19:17:17 Re: application of KNN code to US zipcode searches?