From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: KNNGiST for knn-search |
Date: | 2009-11-24 12:37:29 |
Message-ID: | 4B0BD389.5000509@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Teodor Sigaev wrote:
>>> 1. KNNGiST is about 5% slower than GiST on non-knn search queries, like
>>> contains or contained by, because of some overhead of new algorithm of
>>> tree traversal
>>
>> Is it possible to use the regular GiST traversal algorithm on a
>> KNNGiST-tree, when performing regular GiST searches that don't require a
>> particular order?
> New algorithm works much more with memory for allocation/free to manage
> lists and it's a single reason of performance loss. Choosing of
> algorithm could not be done by consistent function, it should be done at
> least in amrescan method or even earlier - in planner.
Ok, that sounds good. The bottom line is that you can use the same
on-disk tree with both algorithms. No need for a separate indexam in
that case.
> One idea:
> SELECT p FROM pt WHERE p << '5.0,5.0'::point ORDER BY (p <->
> '5.0,5.0'::point) DESC LIMIT 10;
> And add <-> to opclass (but for now any indexable operation should
> return boolean type).
You really shouldn't need to have a WHERE clause.
> Of course, KNNGiST should be modified to support
> not only k-nearest search but k-"farest" search and NULLS LAST/FIRST.
Well, as long as the planner knows the capabilities of the indexam, it
can just fall back to a seqscan+sort if the query can't be sped up with
the index.
> And now you can specify p >< 'one point' AND p >< 'another
> point', but it's impossible to do that by ORDER BY clause.
Huh, what does that mean? Is it like "ORDER BY (min( p >< 'one point', p
>< 'another point')" ?
> Second idea with non-standard syntax.
> SELECT ... ORDER BY PROXIMITY OF expression[, expression [..]] TO
> expression[, expression [..]] USING [operator [, operator [..]]
> and operator is distance operator, i.e. it's not a member of btree
> opclass, but returns non-negative float8 value.
>
> Without index it will be essentially the same as
> ORDER BY expression operator expression[ + ..] DESC NULLS LAST
We already have the syntax to represent the query, using ORDER BY. IMHO
we just need to teach the planner that when it sees a query like that,
it can use a GiST index to speed it up. A number of indexam and operator
class API changes are probably required, but it should be invisible to
the user.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Farina | 2009-11-24 13:00:30 | Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION |
Previous Message | Pavel Stehule | 2009-11-24 12:37:18 | Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION |