Re: [PATCH] kNN for btree

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] kNN for btree
Date: 2019-03-14 22:11:58
Message-ID: ed2731df-2400-59ab-7b5f-ece685208d3d@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached 10th versions of the patches.

Fixed two bugs in patches 3 and 10 (see below).
Patch 3 was extracted from the main patch 5 (patch 4 in v9).

On 11.03.2019 20:42, Alexander Korotkov wrote:

> Hi!
>
> I've some questions regarding this patchset.
>
> 1) This comment needs to be revised. Now, B-tree supports both
> ammatchorderby and amcanbackward. How do we guarantee that kNN is not
> backwards scan?
>
> /*
> * Only forward scan is supported with reordering. Note: we can get away
> * with just Asserting here because the system will not try to run the
> * plan backwards if ExecSupportsBackwardScan() says it won't work.
> * Currently, that is guaranteed because no index AMs support both
> * ammatchorderby and amcanbackward; if any ever do,
> * ExecSupportsBackwardScan() will need to consider indexorderbys
> * explicitly.
> */

Yes, there was problem with backward kNN scans: they were not disabled in
ExecSupportsBackwardScan(). This can lead to incorrect results for backward
fetches from cursors. Corresponding regression test is included into patch #8.
And the comment was also fixed.

> 2) Why this should be so?
>
> EXPLAIN (COSTS OFF)
> SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23)
> ORDER BY thousand DESC, tenthous <-> 3500;
> QUERY PLAN
> ---------------------------------------------------------------------
> Sort
> Sort Key: thousand DESC, ((tenthous <-> 3500))
> -> Index Only Scan using tenk1_thous_tenthous on tenk1
> Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[]))
> (4 rows)
>
> EXPLAIN (COSTS OFF)
> SELECT thousand, tenthous FROM tenk1 WHERE thousand IN (5, 120, 3456, 23)
> ORDER BY thousand, tenthous <-> 3500;
> QUERY PLAN
> ---------------------------------------------------------------
> Index Only Scan using tenk1_thous_tenthous on tenk1
> Index Cond: (thousand = ANY ('{5,120,3456,23}'::integer[]))
> Order By: (thousand AND (tenthous <-> 3500))
> (3 rows)
>
> It seems that we restart bidirectional scan for each value specified
> in IN clause. Then why does it matter whether it is forward or
> backward scan?

kNN scans now can only be forward, and in forward btree scans array iteration
order matches the index sort order. We could determine array iteration order
by ScanKey strategy, but ASC/DESC info flag is not passed now to the place of
ScanKeys initialization (see ExecIndexBuildScanKeys()). ASC/DESC passing needs
refactoring of whole passing of orderbyclauses/orderbyclausecols.

There also was a problem in btmmatchorderby()/match_patchkey_to_indexcol():
array keys were incorrectly matched to DESC index columns.

> 3) What is idea of 8th patch? It doesn't seem to be really needed for
> subsequent 9th patch, because we anyway ignore partial match pathkeys.
> Then why bother producing them? Is it stub for further incremental
> sort?

Yes, this is a kind of stub for incremental sort. But also this simplifies
a bit ammatchorderby() functions, because they should not care about the length
of returned pathkey list, they simply return after the first unsupported
pathkey. I event think that ammacthorderby() should not depend on whether we
support incremental sorting or not.

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Fix-get_index_column_opclass-v10.patch.gz application/gzip 802 bytes
0002-Introduce-ammatchorderby-function-v10.patch.gz application/gzip 6.1 KB
0003-Enable-ORDER-BY-operator-scans-on-ordered-indexes-v10.patch.gz application/gzip 1.4 KB
0004-Extract-structure-BTScanState-v10.patch.gz application/gzip 10.3 KB
0005-Add-kNN-support-to-btree-v10.patch.gz application/gzip 17.2 KB
0006-Add-btree-distance-operators-v10.patch.gz application/gzip 11.7 KB
0007-Remove-distance-operators-from-btree_gist-v10.patch.gz application/gzip 10.8 KB
0008-Add-regression-tests-for-kNN-btree-v10.patch.gz application/gzip 5.9 KB
0009-Allow-ammatchorderby-to-return-pathkey-sublists-v10.patch.gz application/gzip 1000 bytes
0010-Add-support-of-array-ops-to-btree-kNN-v10.patch.gz application/gzip 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mitar 2019-03-14 23:19:40 Re: Feature: temporary materialized views
Previous Message Tom Lane 2019-03-14 22:08:08 Re: hyrax vs. RelationBuildPartitionDesc