Re: application of KNN code to US zipcode searches?

From: Mark Stosberg <mark(at)summersault(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: application of KNN code to US zipcode searches?
Date: 2011-02-17 16:41:51
Message-ID: ijjj4g$c2k$1@dough.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


I tried again to use KNN for a real-world query, and I was able to get
it to add an approximately 6x speed-up vs the cube search or
earthdistance methods ( from 300 ms to 50ms ).

I had to make some notable changes for the KNN index to be considered.

- Of course, I had to switch to using basic point/distance calculation.
As previously noted, this still needs more work to confirm the
accuracy and get the "distance" reported in miles.

- The query planner didn't like it when the "ORDER BY" referred to a
column value instead of a static value, even when I believe it should
know that the column value never changes. See this pseudo-query where
we look-up the coordinates for 90210 once:

EXPLAIN ANALYZE
SELECT pets.pet_id,
zipcodes.lon_lat <-> center.lon_lat AS radius
FROM (SELECT lon_lat FROM zipcodes WHERE zipcode = '90210') AS
center, pets
JOIN shelters USING (shelter_id)
JOIN zipcodes USING (zipcode)
ORDER BY postal_codes.lon_lat <-> center.lon_lat limit 1000;

This didn't use the KNN index until I changed the "center.lon_lat" in
the ORDER BY to an explicit point value. I'm not sure if that's
expected, or something I should take up with -hackers.

This could be worked around by doing a initial query to look-up this
value, and then feed a static value into this query. That's not ideal,
but the combination would still be faster.

- I had to drop the part of the WHERE clause which restricted the
results to shelters within 50 miles from the target zipcode. However,
I could set the "LIMIT" so high that I could get back "enough" pets,
and then the application could trim out the results. Or, perhaps
I could push this query down into a sub-select, and let PostgreSQL
do a second pass to throw out some of the results.

In any case, with a real-world speed-up of 6x, this looks like it will
be worth it to us to continue to investigate.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2011-02-17 19:13:05 Re: application of KNN code to US zipcode searches?
Previous Message Heikki Linnakangas 2011-02-17 16:41:29 Re: application of KNN code to US zipcode searches?