Re: KNNGiST for knn-search (WIP)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KNNGiST for knn-search (WIP)
Date: 2009-12-30 06:55:26
Message-ID: 603c8f070912292255u30e1983bi22ed5778bd2ce890@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/11/26 Teodor Sigaev <teodor(at)sigaev(dot)ru>:
> Hi!
>
> Contrib module is reworked as a patch for current GiST. Now GiST supports
> KNN-search, the query looks like
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
> or
> SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1
> <-> '0,1';
> Plans are:
> EXPLAIN (COSTS OFF)
> SELECT * FROM point_tbl ORDER BY f1 <-> '0,1';
>               QUERY PLAN
> -----------------------------------------
>  Index Scan using gpointind on point_tbl
>   Index Cond: (f1 <-> '(0,1)'::point)
> EXPLAIN (COSTS OFF)
> SELECT * FROM point_tbl WHERE f1 <@ '(-10,-10),(10,10)':: box ORDER BY f1
> <-> '0,1';
>                                  QUERY PLAN
> ------------------------------------------------------------------------------
>  Index Scan using gpointind on point_tbl
>   Index Cond: ((f1 <@ '(10,10),(-10,-10)'::box) AND (f1 <-> '(0,1)'::point))
>
>
> pg_am now has new column amcanorderbyop (can order by operation), indexes
> with this flag enabled can be used to speedup operations, which returns
> non-boolean value, currently type of returned value should have default
> btree operator class  to perform sort.
>
> Planner (find_usable_indexes function, actually) could push pathkey
> expression into restriction clauses of index. I'm not fully satisfied with
> this piece of code, but, may be, someone has a better idea. I though about
> adding separate indexorderquals in IndexPath structure...
>
> Both GiST's get methods are optimized and there is no overhead, since
> gistrescan method can choose what traversal algorithm to use using
> information about types of values returned by operations. If at least one of
> them returns non-boolean result, then KNN-search will be performed.
>
> The only change in interface of supporting functions is: consistentFn
> function could return float8 non-negative value and it's mandatory to
> perform KNN-search. Old-style consistent functions are supported.
>
> Patch contains (it still requires rbtree-0.5 and point_ops-0.4 patches):
> - GiST changes
> - changes in point_ops to support knn-search
> - contrib/pg_trgm now has new operation <-> returns distance between texts.
> This operation is supported in KNN-search
> - contrib/btree_gist provides <-> and its support for GiST for types int2,
> int4, int8, float4, float8, money, oid, interval, time, date, timestamp and
> timestamptz
>
> TODO:
> - selectivity of ordering operation should be 1.0
> - current patch remove support of  IndexScanDesc->kill_prior_tuple, it's
> needed to restore support if it will not create too big overhead
> - documentation

Based on the feedback provided on this patch so far, it looks like
some changes are probably needed, but it's not entirely clear whether
the feedback provided is sufficient to provide guidance on what
changes should be made. It does also need to be updated to CVS HEAD,
as it no longer applies cleanly.

I tend to feel that we should probably target this for 8.6 rather than
8.5. We are down to the last CommitFest, and while we don't have a
nailed-down criterion for what is "too big" for the last CommitFest of
a given release cycle, this is definitely a big, invasive patch. This
patch weights in at over 2400 adds/removes, and it's not boilerplate
stuff like updates to pg_proc entries, but real, significant changes.
I'm worried that applying something like this late in the release
cycle is just not a good idea, especially given the fact that it
probably still needs significant revising. However, I'm fairly
conservative by nature, so perhaps someone else will have a different
opinion, or maybe there is a way to restructure it so that the needed
changes are less invasive.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-12-30 08:37:39 Re: updateMinRecoveryPoint bug?
Previous Message Jeff Davis 2009-12-30 06:00:25 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION