Re: KNN-GiST with recheck

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KNN-GiST with recheck
Date: 2014-10-06 09:36:09
Message-ID: 20141006093609.GA12899@hasegeli.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Thanks. The main question now is design of this patch. Currently, it does
> all the work inside access method. We already have some discussion of pro
> and cons of this method. I would like to clarify alternatives now. I can
> see following way:
>
> 1. Implement new executor node which performs sorting by priority queue.
> Let's call it "Priority queue". I think it should be separate node from
> "Sort" node. Despite "Priority queue" and "Sort" are essentially similar
> from user view, they would be completely different in implementation.
> 2. Implement some interface to transfer distance values from access
> method to "Priority queue" node.

If we assume that all of them need recheck, maybe it can be done
without passing distance values.

> 3. Somehow tell the planner that it could use "Priority queue" in
> corresponding cases. I see two ways of doing this:
> - Add flag to operator in opclass indicating that index can only
> order by lower bound of "col op value", not by "col op value" itself.
> - Define new relation between operators. Value of one operator could
> be lower bound for value of another operator. So, planner can
> put "Priority
> queue" node when lower bound ordering is possible from index. Also "ALTER
> OPERATOR" command would be reasonable, so extensions could upgrade.

I think, it would be better to make it a property of the operator
class. We can add a column to pg_amop or define another value for
amoppurpose on pg_amop. Syntax can be something like this:

CREATE OPERATOR CLASS circle_ops DEFAULT
FOR TYPE circle USING gist AS
OPERATOR 15 <->(circle, point) FOR ORDER BY pg_catalog.float_ops LOWER BOUND;

While looking at it, I realize that current version of the patch does
not use the sort operator family defined with the operator class. It
assumes that the distance function will return values compatible with
the operator. Operator class definition makes me think that there is
not such an assumption.

> Besides overhead, this way makes significant infrastructural changes. So,
> it may be over-engineering. However, it's probably more clean and beautiful
> solution.
> I would like to get some feedback from people familiar with KNN-GiST like
> Heikki or Tom. What do you think about this? Any other ideas?

I would be happy to test and review the changes. I think it is nice
to solve the problem in a generalized way improving the access method
infrastructure. Definitely, we should have a consensus on the design
before working on the infrastructure changes.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2014-10-06 09:57:25 Re: Patch to support SEMI and ANTI join removal
Previous Message Ali Akbar 2014-10-06 08:44:55 Re: Add generate_series(numeric, numeric)