Skip site navigation (1) Skip section navigation (2)

Re: application of KNN code to US zipcode searches?

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: application of KNN code to US zipcode searches?
Date: 2011-02-17 16:41:51
Message-ID: ijjj4g$c2k$1@dough.gmane.org (view raw or flat)
Thread:
Lists: pgsql-performance
I tried again to use KNN for a real-world query, and I was able to get
it to add an approximately 6x speed-up vs the cube search or
earthdistance methods ( from 300 ms to 50ms ).

I had to make some notable changes for the KNN index to be considered.

- Of course, I had to switch to using basic point/distance calculation.
  As previously noted, this still needs more work to confirm the
  accuracy and get the "distance" reported in miles.

- The query planner didn't like it when the "ORDER BY" referred to a
  column value instead of a static value, even when I believe it should
  know that the column value never changes. See this pseudo-query where
  we look-up the coordinates for 90210 once:

  EXPLAIN ANALYZE
  SELECT pets.pet_id,
      zipcodes.lon_lat <-> center.lon_lat AS radius
      FROM (SELECT lon_lat FROM zipcodes WHERE zipcode = '90210') AS
center, pets
      JOIN shelters USING (shelter_id)
      JOIN zipcodes USING (zipcode)
       ORDER BY postal_codes.lon_lat <-> center.lon_lat limit 1000;

  This didn't use the KNN index until I changed the "center.lon_lat" in
  the ORDER BY to an explicit point value. I'm not sure if that's
  expected, or something I should take up with -hackers.

  This could be worked around by doing a initial query to look-up this
  value, and then feed a static value into this query. That's not ideal,
  but the combination would still be faster.

- I had to drop the part of the WHERE clause which restricted the
  results to shelters within 50 miles from the target zipcode. However,
  I could set the "LIMIT" so high that I could get back "enough" pets,
  and then the application could trim out the results. Or, perhaps
  I could push this query down into a sub-select, and let PostgreSQL
  do a second pass to throw out some of the results.

In any case, with a real-world speed-up of 6x, this looks like it will
be worth it to us to continue to investigate.



In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2011-02-17 19:13:05
Subject: Re: application of KNN code to US zipcode searches?
Previous:From: Heikki LinnakangasDate: 2011-02-17 16:41:29
Subject: Re: application of KNN code to US zipcode searches?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group