Re: optimizing a geo_distance() proximity query

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Mark Stosberg <mark(at)summersault(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: optimizing a geo_distance() proximity query
Date: 2007-02-03 19:11:51
Message-ID: Pine.LNX.4.64.0702032206340.400@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Mark,

in astronomy we extensively use such kind of query, which we call
radial search or conesearch. There are several algorithms which perform
efficiently such query using spherical coordinates.
Specifically, we use our Q3C algorithm, see
http://www.sai.msu.su/~megera/wiki/SkyPixelization for details,
which was designed for PostgreSQL and is freely available.

The paper is http://lnfm1.sai.msu.ru/~math/docs/adass_proceedings2005.pdf
Web site - http://q3c.sf.net/

Oleg

On Sat, 3 Feb 2007, Mark Stosberg wrote:

>
> Hello,
>
> I'm using geo_distance() from contrib/earthdistance would like to find a
> way to spend up the geo distance calculation if possible. This is for a
> proximity search: "Show me adoptable pets within 250 miles of this
> zipcode".
>
> I'm researched a number of approaches to this, but none seem as workable
> as I would have hoped.
>
> I read this claim [1] that Jobster used a lookup table of pre-calculated
> distances between zipcodes...it took about 100 million rows.
>
> 1. http://bostonsteamer.livejournal.com/831325.html
>
> I'd like to avoid that, but I think there's a sound concept in there: we
> repeatedly making a complex calculation with the same inputs, and the
> outputs are always the same.
>
> The zipdy project [2] used some interesting approaches, both similar to
> the large table idea. One variation involved a PL routine that would
> look up the result in a cache table. If no result was found, it would
> would compute the result and add it to the cache table. Besides
> eventually creating millions of rows in the cache table, I tried this
> technique and found it was much slower than using geo_distance() without
> a cache. Another variation in the zipdy distribution just uses several
> smaller cache tables, like one for "zipcodes 25 miles away" and
> "zipcodes 50 miles away". Equally unattractive.
>
> 2. http://www.cryptnet.net/fsp/zipdy/
>
> I looked at doing the calculation outside of PostgreSQL, and passing in
> the resulting list of zipcodes in an explicit IN() list. This seem
> promising at first. Geo::Postalcode (Perl) could do the calculation in
> 5ms, which seemed to beat PostgreSQL. For a small proximity, I think
> that combination might have performed better. However, some places have
> close to 5,000 zipcodes within 250 files. I tried putting /that/
> resulting list into an explicitly IN() clause, and it got much slower. :)
>
> I did find there are some possible optimizations that can be made to the
> Haversine algorithm itself. As this post pointed out [3], we could
> pre-convert the lat/lon pair to radians, and also compute their sin()
> and cos() values. However, the person suggesting this approach provided
> no benchmarks to suggest it was worth it, and I have no evidence so far
> that it matters either.
>
> 3.
> http://www.voxclandestina.com/2006-09-01/optimizing-spatial-proximity-searches-in-sql/
>
> What strikes me to consider at this point are a couple of options:
>
> A. Explore a way add some memory caching or "memoizing" to
> geo_distance() so it would hold on to frequently pre-computed values,
> but without storing all the millions of possibilities.
>
> B. Look at an alternate implementation. I suspect that given a small
> enough radius and the relatively large size of zipcodes, a simpler
> representation of the Earth's curvature could be used, with a sufficient
> accuracy. Perhaps a cylinder, or even a flat projection... We currently
> max out at 250 miles. ( I just discussed this option with my wife, the
> math teacher. :)
>
> Advice from other people who have deployed high-performance proximity
> searches with PostgreSQL would be appreciated!
>
> Mark
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Bruno Wolff III 2007-02-04 05:16:44 Re: optimizing a geo_distance() proximity query
Previous Message Mark Stosberg 2007-02-03 19:00:26 optimizing a geo_distance() proximity query