Re: gist indexes for distance calculations

From: Marcelo Zabani <mzabani(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: gist indexes for distance calculations
Date: 2010-10-01 02:50:14
Message-ID: AANLkTimoKwUA75i9kSvf4tvSxbnLJ5p6D68xLJ=EiQkL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

So let me see if I understand, when searching for everyone within radius "r"
of point (a,b), the gist index will be used like this:
* C is the circle centered on (a,b) of radius "r"

1. Traverse down the tree, starting at the root. Only go down nodes whose
bounding-box has a non-empty intersection with the circle C (how fast is
this verification?)
2. Once the leaves are reached, verify for everyone of them whether they're
inside C or not, returning those that are.

If this really is how it happens, then I ask: What about boxes that
intersect (does that happen)? What if the boxes aren't "nice" (boxes with a
very small desired intersection with the circle and a very large quantity of
unwanted rows).

Also, you've said that with b-tree indexes on two orthogonal coordinates
(two columns) postgresql would need to check the ENTIRE vertical strip
bounded by a-1 and a+1 and the ENTIRE horizontal strip bounded by b-1 and
b+1 (two limitless, "infinite" rectangles)? This leads me to another
question:

- Isn't there an equality check between all the rows returned by both
indexes, and then the actual distance calculations are performed only for
those returned by both indexes? What if I have many more indexes on my
table, how are things done?

And if I may, one last question:
- Is box-bounding the index strategy used for all geometric operations with
gist indexes?

Also, to Oleg:
I had personally tested the use of gist indexes (without limiting the number
of returned rows, however) for more than 15 million rows (with their
coordinates distributed a VERY LARGE area, sadly). The results were still
impressive to me (although I didn't know what to expect, maximum running
times of around 17ms seemed great to me!).
And sorry for sending this message to your personal email (my mistake).

Thanks a lot for all the help, if you can lead me to any docs/articles, I'll
gladly read them.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Karim Nassar 2010-10-01 04:45:29 Re: gist indexes for distance calculations
Previous Message Mark Kirkwood 2010-09-30 21:01:26 Re: Memory usage - indexes