From: | Teodor Sigaev <teodor(at)sigaev(dot)ru> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Subject: | Re: knngist - 0.8 |
Date: | 2010-09-07 14:54:14 |
Message-ID: | 4C865216.1090305@sigaev.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz
>> http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz
New version, synced with CVS HEAD
http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.2.gz
http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.1.gz
> AFAICS, these patches include no documentation. That's pretty much a
> fatal flaw for a feature of this magnitude. At an absolute minimum,
> you need to update the system catalog documentation and the
> documentation on CREATE / ALTER OPERATOR CLASS. There might be some
> other places that need to be touched, also.
Oleg promised to do that
> + if (opform->oprresult == BOOLOID)
> + ereport(ERROR,
> +
> (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> + errmsg("index ordering
> operators must not return boolean")));
>
> My first thought was that this code was there to prevent people from
> doing the wrong thing by accident. But I have a niggling feeling that
> you're actually relying on this for the correctness of the system. I
> hope I'm wrong, because I don't think that would be a very good idea.
This play is around do we really want to have support of boolean-distance in
GiST? I think no, because it's a strange idea to measure distance in true/false
measurement units. I can't imagine such real-life distance definition and never
heard about that.
Next, pg_amop_opr_fam_index requires uniqueness of operation in operation family
and a lot of places in planner believes in that. Suppose, changing that requires
a lot of work which has the single aim to support boolean distance in ORDER BY
clause.
> The GIST code code use more comments; and perhaps the names of some of
> the functions and structures could be chosen to be more descriptive.
> I think that what used to be called GISTSearchStack has apparently
> been replaced with DataPointer; it's not obvious to me that it's good
> to change the name, but if it is I don't think DataPointer is a good
GISTSearchStack is replaced by RBTree (GISTScanOpaqueData->stack), tree's nodes
contain a StackElem struct which represents list of pointers at the same
distance. Each pointer could be a pointer to the inner index's page or to the
heap's tuple and this struct is a DataPointer.
Note, list of DataPointer in StackElem struct is organized by non-obvious way:
we keep pointer to the head of list and pointer to the middle of list. New
pointer-to-heap is inserted in the beginning of list, pointers-to-index-page -
in the middle. That's done because we would like to:
1) pop pointers-to-heap as fast as possible, before any pointers-to-index-page
2) pop pointers-to-index-page to deep page (which is closer to leaf pages)
first. That's good for KNN performance and emulates classical first-depth
search in ordinary search.
> choice. gistindex_keytest has been replaced (sort of) by
> processIndexTuple, which again seems more generic than what it
> replaced.
Renamed, comments are improved
> Minor nit: the word "shoould" is mis-spelled.
fixed
BTW, now consistentFn is able to "manage" tree traversal - even for for ordinary
search, GiST will choose child page with minimal distance.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-09-07 14:56:35 | Re: can we publish a aset interface? |
Previous Message | Ron Mayer | 2010-09-07 14:47:33 | Re: Synchronization levels in SR |