Re: KNN-GiST with recheck

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: KNN-GiST with recheck
Date: 2014-08-03 13:48:09
Message-ID: 20140803134809.GA58181@hasegeli-2.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > 1. This patch introduces a new "polygon <-> point" operator. That seems
> > useful on its own, with or without this patch.
>
> Yeah, but exact-knn cant come with no one implementation. But it would
> better come in a separate patch.

I tried to split them. Separated patches are attached. I changed
the order of the arguments as point <-> polygon, because point was
the first one on all the others. Its commutator was required for
the index, so I added it on the second patch. I also added tests
for the operator. I think it is ready for committer as a separate
patch. We can add it to the open CommitFest.

I have made some cosmetic changes on the patches. I hope they are
useful.

I added support to point <-> circle operator with the same GiST
distance function you added for polygon. I can change it, if it is not
the right way.

> > 2. I wonder how useful it really is to allow mixing exact and non-exact
> > return values from the distance function. The distance function included in
> > the patch always returns recheck=true. I have a feeling that all other
> > distance function will also always return either true or false.
>
> For geometrical datatypes recheck variations in consistent methods are also
> very rare (I can't remember any). But imagine opclass for arrays where keys
> have different representation depending on array length. For such opclass
> and knn on similarity recheck flag could be useful.

I also wonder how useful it is. Your example is convincing, but maybe
setting it index-wide will make the decisions on the framework easier.
For example, how hard would it be to decide if further sorting is
required or not on the planner?

> > 4. (as you mentioned in the other thread: ) It's a modularity violation
> > that you peek into the heap tuple from gist. I think the proper way to do
> > this would be to extend the IndexScan executor node to perform the
> > re-shuffling of tuples that come from the index in wrong order, or perhaps
> > add a new node type for it.
> >
> > Of course that's exactly what your partial sort patch does :-). I haven't
> > looked at that in detail, but I don't think the approach the partial sort
> > patch takes will work here as is. In the KNN-GiST case, the index is
> > returning tuples roughly in the right order, but a tuple that it returns
> > might in reality belong somewhere later in the ordering. In the partial
> > sort patch, the "input stream" of tuples is divided into non-overlapping
> > groups, so that the tuples within the group are not sorted, but the groups
> > are. I think the partial sort case is a special case of the KNN-GiST case,
> > if you consider the lower bound of each tuple to be the leading keys that
> > you don't need to sort.
>
> Yes. But, for instance btree accesses heap for unique checking. Is really
> it so crimilal? :-)
> This is not only question of a new node or extending existing node. We need
> to teach planner/executor access method can return value of some expression
> which is lower bound of another expression. AFICS now access method can
> return only original indexed datums and TIDs. So, I afraid that enormous
> infrastructure changes are required. And I can hardly imagine what they
> should look like.

Unfortunately, I am not experienced enough to judge your implementation.
As far as I understand the problem is partially sorting rows on
the index scan node. It can lead the planner to choose non-optimal
plans, because of not taking into account the cost of sorting.

Attachment Content-Type Size
polygon-distance-op-1.patch text/plain 12.7 KB
knn-gist-recheck-2.patch text/plain 45.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-08-03 15:55:47 Re: Usability improvements for pg_stop_backup()
Previous Message Bruce Momjian 2014-08-03 13:27:56 Re: Proposed changing the definition of decade for date_trunc and extract