Re: [PATCH] kNN for btree

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: David Steele <david(at)pgmasters(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] kNN for btree
Date: 2018-09-26 15:37:54
Message-ID: 6a301bdc-1d95-a2ac-6e4e-23b1d5ceab51@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Attached 3rd version of the patches rebased onto the current master.

Changes from the previous version:
- Added support of INCLUDE columns to get_index_column_opclass() (1st patch).
- Added parallel kNN scan support.
- amcanorderbyop() was transformed into ammatchorderby() which takes a List of
PathKeys and checks each of them with new function match_orderbyop_pathkey()
extracted from match_pathkeys_to_index(). I think that this design can be
used in the future to support a mix of ordinary and order-by-op PathKeys,
but I am not sure.

On 09.03.2017 15:00, Alexander Korotkov wrote:
> On Thu, Mar 2, 2017 at 5:57 PM, David Steele <david(at)pgmasters(dot)net
> <mailto:david(at)pgmasters(dot)net>> wrote:
>
> Hi Alexander,
>
> On 2/16/17 11:20 AM, Robert Haas wrote:
> > On Thu, Feb 16, 2017 at 10:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us
> <mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us>> wrote:
> >> Robert Haas <robertmhaas(at)gmail(dot)com
> <mailto:robertmhaas(at)gmail(dot)com>> writes:
> >>> On Thu, Feb 16, 2017 at 8:05 AM, Alexander Korotkov
> >>> <a(dot)korotkov(at)postgrespro(dot)ru <mailto:a(dot)korotkov(at)postgrespro(dot)ru>>
> wrote:
> >>>> My idea is that we need more general redesign of specifying
> ordering which
> >>>> index can produce.  Ideally, we should replace amcanorder,
> amcanbackward and
> >>>> amcanorderbyop with single callback. Such callback should
> take a list of
> >>>> pathkeys and return number of leading pathkeys index could
> satisfy (with
> >>>> corresponding information for index scan).  I'm not sure that
> other hackers
> >>>> would agree with such design, but I'm very convinced that we
> need something
> >>>> of this level of extendability. Otherwise we would have to
> hack our planner
> >>>> <-> index_access_method interface each time we decide to
> cover another index
> >>>> produced ordering.
> >>
> >>> Yeah.  I'm not sure if that's exactly the right idea.  But it
> seems
> >>> like we need something.
> >>
> >> That's definitely not exactly the right idea, because using it
> would
> >> require the core planner to play twenty-questions trying to
> guess which
> >> pathkeys the index can satisfy.  ("Can you satisfy some prefix
> of this
> >> pathkey list?  How about that one?")  It could be sensible to
> have a
> >> callback that's called once per index and hands back a list of
> pathkey
> >> lists that represent interesting orders the index could
> produce, which
> >> could be informed by looking aside at the PlannerInfo contents
> to see
> >> what is likely to be relevant to the query.
> >>
> >> But even so, I'm not convinced that that is a better design or more
> >> maintainable than the current approach.  I fear that it will
> lead to
> >> duplicating substantial amounts of code and knowledge into each
> index AM,
> >> which is not an improvement; and if anything, that increases
> the risk of
> >> breaking every index AM anytime you want to introduce some
> fundamentally
> >> new capability in the area.  Now that it's actually practical
> to have
> >> out-of-core index AMs, that's a bigger concern than it might
> once have
> >> been.
> >
> > Yeah, that's all true.  But I think Alexander is right that just
> > adding amcandoblah flags ad infinitum doesn't feel good either.  The
> > interface isn't really arm's-length if every new thing somebody
> wants
> > to do something new requires another flag.
> >
> >> Also see the discussion that led up to commit ed0097e4f.  Users
> objected
> >> the last time we tried to make index capabilities opaque at the
> SQL level,
> >> so they're not going to like a design that tries to hide that
> information
> >> even from the core C code.
> >
> > Discoverability is definitely important, but first we have to figure
> > out how we're going to make it work, and then we can work out how to
> > let users see how it works.
>
> Reading through this thread I'm concerned that this appears to be
> a big
> change making its first appearance in the last CF.  There is also the
> need for a new patch and a general consensus of how to proceed.
>
> Yes, refactoring of amcanorder/amcanorderbyop should be very thoughtful.
>
> I recommend moving this patch to 2017-07 or marking it RWF.
>
>
> I agree. Done.
>
> ------
> Alexander Korotkov
> Postgres Professional: http://www.postgrespro.com
> <http://www.postgrespro.com/>
> The Russian Postgres Company

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Fix-get_index_column_opclass-v03.patch text/x-patch 1.2 KB
0002-Introduce-ammatchorderby-function-v03.patch text/x-patch 20.7 KB
0003-Extract-structure-BTScanState-v03.patch text/x-patch 43.2 KB
0004-Add-kNN-support-to-btree-v03.patch text/x-patch 59.3 KB
0005-Add-btree-distance-operators-v03.patch text/x-patch 77.5 KB
0006-Remove-distance-operators-from-btree_gist-v03.patch text/x-patch 101.2 KB
0007-Add-regression-tests-for-kNN-btree-v03.patch text/x-patch 28.7 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2018-09-26 15:54:43 Re: transction_timestamp() inside of procedures
Previous Message Stephen Frost 2018-09-26 15:30:31 Re: Online verification of checksums