Re: KNNGiST for knn-search (WIP)

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KNNGiST for knn-search (WIP)
Date: 2009-11-26 16:34:21
Message-ID: 4B0EAE0D.606@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Contrib module is reworked as a patch for current GiST. Now GiST supports
KNN-search, the query looks like
SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
or
SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1';
Plans are:
EXPLAIN (COSTS OFF)
SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
QUERY PLAN
-----------------------------------------
Index Scan using gpointind on point_tbl
Index Cond: (f1 <-> '(0,1)'::point)
EXPLAIN (COSTS OFF)
SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1 <-> '0,1';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using gpointind on point_tbl
Index Cond: ((f1 <@ '(10,10),(-10,-10)'::box) AND (f1 <-> '(0,1)'::point))

pg_am now has new column amcanorderbyop (can order by operation), indexes with
this flag enabled can be used to speedup operations, which returns non-boolean
value, currently type of returned value should have default btree operator class
to perform sort.

Planner (find_usable_indexes function, actually) could push pathkey expression
into restriction clauses of index. I'm not fully satisfied with this piece of
code, but, may be, someone has a better idea. I though about adding separate
indexorderquals in IndexPath structure...

Both GiST's get methods are optimized and there is no overhead, since gistrescan
method can choose what traversal algorithm to use using information about types
of values returned by operations. If at least one of them returns non-boolean
result, then KNN-search will be performed.

The only change in interface of supporting functions is: consistentFn function
could return float8 non-negative value and it's mandatory to perform KNN-search.
Old-style consistent functions are supported.

Patch contains (it still requires rbtree-0.5 and point_ops-0.4 patches):
- GiST changes
- changes in point_ops to support knn-search
- contrib/pg_trgm now has new operation <-> returns distance between texts. This
operation is supported in KNN-search
- contrib/btree_gist provides <-> and its support for GiST for types int2, int4,
int8, float4, float8, money, oid, interval, time, date, timestamp and timestamptz

TODO:
- selectivity of ordering operation should be 1.0
- current patch remove support of IndexScanDesc->kill_prior_tuple, it's needed
to restore support if it will not create too big overhead
- documentation
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
builtin_knngist-0.4.gz application/x-tar 25.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2009-11-26 16:36:01 Re: cvs chapters in our docs
Previous Message Tom Lane 2009-11-26 16:14:54 Re: ecpg.addons