|From:||Emre Hasegeli <emre(at)hasegeli(dot)com>|
|To:||Alexander Korotkov <aekorotkov(at)gmail(dot)com>|
|Cc:||PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>|
|Subject:||Re: KNN-GiST with recheck|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
> > 1. This patch introduces a new "polygon <-> point" operator. That seems
> > useful on its own, with or without this patch.
> Yeah, but exact-knn cant come with no one implementation. But it would
> better come in a separate patch.
I tried to split them. Separated patches are attached. I changed
the order of the arguments as point <-> polygon, because point was
the first one on all the others. Its commutator was required for
the index, so I added it on the second patch. I also added tests
for the operator. I think it is ready for committer as a separate
patch. We can add it to the open CommitFest.
I have made some cosmetic changes on the patches. I hope they are
I added support to point <-> circle operator with the same GiST
distance function you added for polygon. I can change it, if it is not
the right way.
> > 2. I wonder how useful it really is to allow mixing exact and non-exact
> > return values from the distance function. The distance function included in
> > the patch always returns recheck=true. I have a feeling that all other
> > distance function will also always return either true or false.
> For geometrical datatypes recheck variations in consistent methods are also
> very rare (I can't remember any). But imagine opclass for arrays where keys
> have different representation depending on array length. For such opclass
> and knn on similarity recheck flag could be useful.
I also wonder how useful it is. Your example is convincing, but maybe
setting it index-wide will make the decisions on the framework easier.
For example, how hard would it be to decide if further sorting is
required or not on the planner?
> > 4. (as you mentioned in the other thread: ) It's a modularity violation
> > that you peek into the heap tuple from gist. I think the proper way to do
> > this would be to extend the IndexScan executor node to perform the
> > re-shuffling of tuples that come from the index in wrong order, or perhaps
> > add a new node type for it.
> > Of course that's exactly what your partial sort patch does :-). I haven't
> > looked at that in detail, but I don't think the approach the partial sort
> > patch takes will work here as is. In the KNN-GiST case, the index is
> > returning tuples roughly in the right order, but a tuple that it returns
> > might in reality belong somewhere later in the ordering. In the partial
> > sort patch, the "input stream" of tuples is divided into non-overlapping
> > groups, so that the tuples within the group are not sorted, but the groups
> > are. I think the partial sort case is a special case of the KNN-GiST case,
> > if you consider the lower bound of each tuple to be the leading keys that
> > you don't need to sort.
> Yes. But, for instance btree accesses heap for unique checking. Is really
> it so crimilal? :-)
> This is not only question of a new node or extending existing node. We need
> to teach planner/executor access method can return value of some expression
> which is lower bound of another expression. AFICS now access method can
> return only original indexed datums and TIDs. So, I afraid that enormous
> infrastructure changes are required. And I can hardly imagine what they
> should look like.
Unfortunately, I am not experienced enough to judge your implementation.
As far as I understand the problem is partially sorting rows on
the index scan node. It can lead the planner to choose non-optimal
plans, because of not taking into account the cost of sorting.
|Next Message||Kevin Grittner||2014-08-03 15:55:47||Re: Usability improvements for pg_stop_backup()|
|Previous Message||Bruce Momjian||2014-08-03 13:27:56||Re: Proposed changing the definition of decade for date_trunc and extract|