Re: [PATCH] kNN for btree

From: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
Subject: Re: [PATCH] kNN for btree
Date: 2019-03-03 09:46:15
Message-ID: 155160637533.16480.9693908948868055357.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, passed
Implements feature: tested, passed
Spec compliant: tested, passed
Documentation: tested, passed

Hi,
thank you for your work on this patch.

Patch #1 is ready for commit.
It only fixes lack of refactoring after INCLUDE index patch.

Patches 2-7 are ready for committer's review.
I found no significant problems in algorithm or implementation.
Here are few suggestions to improve readability:

1) patch 0002.
* Returned matched index clause exression.
* Number of matched index column is returned in *indexcol_p.

Typos in comment:
exPression
index columnS

2) patch 0002.
+ /*
+ * We allow any column of this index to match each pathkey; they
+ * don't have to match left-to-right as you might expect.
+ */

Before refactoring this comment had a line about gist and sp-gist specific:

- * We allow any column of the index to match each pathkey; they
- * don't have to match left-to-right as you might expect. This is
- * correct for GiST, and it doesn't matter for SP-GiST because
- * that doesn't handle multiple columns anyway, and no other
- * existing AMs support amcanorderbyop. We might need different
- * logic in future for other implementations.

Now, when the code was moved to a separate function, it is not clear why the same logic is ok for b-tree and other index methods.
If I got it right, it doesn't affect the correctness of the algorithm, because b-tree specific checks are performed in another function.
Still it would be better to explain it in the comment for future readers.

3) patch 0004
if (!so->distanceTypeByVal)
{
so->state.currDistance = PointerGetDatum(NULL);
so->state.markDistance = PointerGetDatum(NULL);
}

Why do we reset these fields only for !distanceTypeByVal?

4) patch 0004
+ * _bt_next_item() -- Advance to next tuple on current page;
+ * or if there's no more, try to step to the next page with data.
+ *
+ * If there are no more matching records in the given direction
*/

Looks like the last sentence of the comment is unfinished.

5) patch 0004
_bt_start_knn_scan()

so->currRightIsNearest = _bt_compare_current_dist(so, rstate, lstate);
/* Reset right flag if the left item is nearer. */
right = so->currRightIsNearest;

If this comment relates to the string above it?

6) patch 0004
_bt_parallel_seize()

+ scanPage = state == &so->state
+ ? &btscan->btps_scanPage
+ : &btscan->btps_knnScanPage;

This code looks a bit tricke to me. Why do we need to pass BTScanState state to _bt_parallel_seize() at all?
Won't it be enough to choose the page before function call and pass it?

7) patch 0006
+ <title>Upgrade notes for version 1.6</title>
+
+ <para>
+ In version 1.6 <literal>btree_gist</literal> switched to using in-core
+ distance operators, and its own implementations were removed. References to
+ these operators in <literal>btree_gist</literal> opclasses will be updated
+ automatically during the extension upgrade, but if the user has created
+ objects referencing these operators or functions, then these objects must be
+ dropped manually before updating the extension.

Is this comment still relevant after the latest changes?
Functions are not removed, they're still in contrib.

Patches #8 and #9 need more review and tests.
I'll try to give a feedback on them in the week.

P.S. many thanks for splitting the code into separate patches. It made review a lot easier.

The new status of this patch is: Waiting on Author

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2019-03-03 10:51:48 Re: Online verification of checksums
Previous Message Fabien COELHO 2019-03-03 08:55:58 Re: pgbench - add pseudo-random permutation function