Quick Links

Re: optimizing a geo_distance() proximity query (example and benchmark)

From:	Mark Stosberg <mark(at)summersault(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: optimizing a geo_distance() proximity query (example and benchmark)
Date:	2007-02-05 23:01:05
Message-ID:	eq8d0b$26cd$1@news.hub.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Merlin Moncure wrote:
> On 2/5/07, Mark Stosberg <mark(at)summersault(dot)com> wrote:
>> Bruno Wolff III wrote:
>> > On Sat, Feb 03, 2007 at 14:00:26 -0500,
>> > Mark Stosberg <mark(at)summersault(dot)com> wrote:
>> >> I'm using geo_distance() from contrib/earthdistance would like to
>> find a
>> >> way to spend up the geo distance calculation if possible. This is
>> for a
>> >> proximity search: "Show me adoptable pets within 250 miles of this
>> >> zipcode".
>> >
>> > If you are using the "cube" based part of the earth distance package,
>> > then you can use gist indexes to speed those searches up.
>>
>> Thanks for the tip. Any idea what kind of improvement I can expect to
>> see, compared to using geo_distance()?
>
> a lot. be aware that gist takes longer to build than btree, but very
> fast to search. Index search and filter to box is basically an index
> lookup (fast!). for mostly static datasets that involve a lot of
> searching, gist is ideal.

The documentation in contrib/ didn't provide examples of how to create
or the index or actually a the proximity search. Here's what I figured
out to do:

I added a new column as type 'cube':

ALTER table zipcodes add column earth_coords cube;

Next I converted the old lat/lon data I had stored in a 'point'
type to the new format:

-- Make to get lat/lon in the right order for your data model!
UPDATE zipcodes set earth_coords = ll_to_earth( lon_lat[1], lon_lat[0] );

Now I added a GIST index on the field:

CREATE index earth_coords_idx on zipcodes using gist (earth_coords);

Finally, I was able to run a query, which I could see used the index (by
checking "EXPLAIN ANALYZE ..."

select * from zipcodes where earth_box('(436198.322855334,
4878562.8732218, 4085386.43843934)'::cube,16093.44) @ earth_coords;

It's also notable that the units used are meters, not miles like
geo_distance(). That's what the magic number of "16093.44" is-- 10 miles
converted to meters.

When I benchmarked this query against the old geo_distance() variation,
it was about 200 times faster (~100ms vs .5ms).

However, my next step was to try a more "real world" query that involved
a more complex where clause and a couple of table joins. So far, that
result is coming out /slower/ with the new approach, even though the
index is being used. I believe this may be cause of the additional
results found that are outside of the sphere, but inside the cube. This
causes additional rows that need processing in the joined tables.

Could someone post an example of how to further refine this so the
results more closely match what geo_distance returns() ?

Any other indexing or optimization tips would be appreciated.

Mark

In response to

Re: optimizing a geo_distance() proximity query at 2007-02-05 20:15:15 from Merlin Moncure

Responses

Re: optimizing a geo_distance() proximity query (example and benchmark) at 2007-02-06 05:40:11 from Bruno Wolff III
Re: optimizing a geo_distance() proximity query (example and benchmark) at 2007-02-06 12:58:29 from Merlin Moncure
tip: faster sorting for proximity queries by using cube_distance() at 2007-02-07 21:30:16 from Mark Stosberg

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Karen Hill	2007-02-06 00:35:59	How long should it take to insert 200,000 records?
Previous Message	Bruno Wolff III	2007-02-05 20:22:09	Re: optimizing a geo_distance() proximity query