Skip site navigation (1)
Skip section navigation (2)
## Re: find close (duplicate) points + create index

### In response to

### pgsql-novice by date

On Wed, 10 Mar 2004, Tom Lane wrote: > Elinor Medezinski <elinor(at)bellatrix(dot)tau(dot)ac(dot)il> writes: > > I'm trying to find duplicate entries, where two entries are considered > > duplicates if they're within a radius of 1, meaning something like > > "select point from pointtable where distance between points <=1". > > Obviously this is not SQL syntax. > > Well, it is if you do a self-join: > > select * from pointtable a, pointtable b > where distance(a.point, b.point) <= 1; > > Postgres spells the "distance" operator as "<->", so this becomes > > select * from pointtable a, pointtable b > where (a.point <-> b.point) <= 1; > > Making it fast is a more difficult problem :-( ... if you write the > above query as-is then the system will sit there and compare each row of > pointtable to each other row, looking for pairs of rows that match the > where-clause. Okay if you just have some thousands of rows, but on a > big table this will take longer than you want to wait. I'm guessing distance is defined to be: sqrt( (x1-x0)^2 + (y1-y0)^2 ) or sqrt( (x1-x0)^2 + (y1-y0)^2 + (z1-z0)^2 ) You obviously need to find the difference in the x,y (and possibly z if used) coordinates, as is multiplying that difference by itself and summing them. However, you don't need to take the square root, which is often a computationally expensive thing. Two points which are close in terms of distance, are also close in terms of distance squared. If you are only interested in points that are within a distance of 1 (or a distance squared of 1*1), then you can stop calculating the distance when any of the differences is bigger than 1. Since you don't care, other than knowing it is too big, just stop the calculation and return some number bigger than 1. That should help in speeding things up a bit. How you do this in PostgreSQL, I can't help you. :-) > > Also, I also tried to build an index on that column, but > > there's no operator class for type point. How can I do that? > > A btree index on a point column would be quite useless, since btree > understands only a one-dimensional continuum with less-than, equal, > greater-than relationships. But I think you might be able to do > something with an rtree index. I'd look at making an rtree index on > the unit box around each point, and then using an "overlaps" test as > an indexable coarse filter before the exact distance check. You probably want to look a little at how GIS (Geographical Information Systems) works. There are lots of different techniques they use to search and partition things. Quadtree and oct-tree come to mind. Gord

- Re: find close (duplicate) points + create index at 2004-03-10 05:27:27 from Tom Lane

Next: From:MohanDate:2004-03-10 13:08:40Subject: Re: local usage was: Re: using pgsql on my comp only withoutPrevious: From: ghaverlaDate: 2004-03-10 12:17:26Subject: local usage was: Re: using pgsql on my comp only without tcp