Skip site navigation (1) Skip section navigation (2)

KNNGiST for knn-search

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: KNNGiST for knn-search
Date: 2009-11-23 17:44:01
Message-ID: 4B0AC9E1.5050509@sigaev.ru (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi there,

http://www.sigaev.ru/misc/knngist-0.11.tar.gz

we'd like to present contrib module for CVS HEAD, which contains implementation 
of knn (k nearest neighbourhood) search in PostgreSQL, see README.knngist for
details. KNNGiST is an extension of existing GiST, which inherits most of
their code, so it's has recovery (WAL-logged)  and concurrency support.
Basically, it introduces a new distance-based priority queue tree traversal
algorithm (instead of depth-first in plain GiST), so index scan returns rows
sorted by closiness to a query. Notice, index returns all rows, so one should
use LIMIT clause to specify k (which is usually small) to get real benefits.
We get 300x times better performance on real-life data (about 1 mln points):

Module requires rbtree and point_ops patches applied. 
(http://archives.postgresql.org/message-id/4B0A8DFA.7050009@sigaev.ru and 
http://archives.postgresql.org/message-id/4B0A8F0F.3020308@sigaev.ru)

Old way:
SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM spots
order by dist asc LIMIT 10;

Time: 1024.242 ms

knn-search:
SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM spots
WHERE coordinates  >< '5.0,5.0'::point LIMIT 10;

Time: 3.158 ms


We didn't patch existing implementation of GiST for several reasons:

1. KNNGiST is about 5% slower than GiST on non-knn search queries, like
   contains or contained by, because of some overhead of new algorithm of
   tree traversal
2.  KNNGiST can't be used in  bitmap index scan, which destroys order of results,
   We don't know the way to forbid bitmap index scan only for knn queries.
   Current version of KNNGiST doesn't distinguish knn-search and usual search
   and postgres doesn't know about ordered output from KNNGiST.

We see several ways to add KNNGiST to PostgreSQL:

1. Patch existing GiST.
    Con - see problems above.
    Pro - any existing GiST opclasses will work with KNNGiST.

2. add KNNGIST as a contrib module like now.
    Con - there is no way in PostgreSQL to test other modules, which depends
          on KNNGiST. For example, it's easy to add support for trigrams, but
          then we add dependence on contrib/pg_trgm module.

3. add KNNGiST as separate access method into core of PostgreSQL.
    We think it the best way, since we don't touch existing GiST and opclasses,
    and could use KNNGiST in other contrib modules


Separate problem is query planning.

1. To use KNNGiST we need to use index scan ! To prevent seqscan on table,
    operator >< has extremely high cost.
2. To prevent bitmap index scan, KNNGiST doesn't have bitmap index scan
    interface at all (no amgetbitmap method).


-- 
Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
                                                    WWW: http://www.sigaev.ru/

Attachment: README.knngist
Description: text/plain (11.0 KB)

Responses

pgsql-hackers by date

Next:From: Andrew DunstanDate: 2009-11-23 17:44:02
Subject: Re: Re: [COMMITTERS] pgsql: Remove -w (--ignore-all-space) option from pg_regress's diff
Previous:From: Simon RiggsDate: 2009-11-23 17:39:16
Subject: Re: Partitioning option for COPY

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group