Re: [PATCH] kNN for btree

From: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>
To: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] kNN for btree
Date: 2019-03-06 17:11:41
Message-ID: 351c1866-5f38-5e7c-e848-73df9c6668b0@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox
Thread:
Lists: pgsql-hackers

Attached 9th version of the patches.

On 03.03.2019 12:46, Anastasia Lubennikova wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world: tested, passed
> Implements feature: tested, passed
> Spec compliant: tested, passed
> Documentation: tested, passed
>
> Hi,
> thank you for your work on this patch.

Thank you for you review.

> Patch #1 is ready for commit.
> It only fixes lack of refactoring after INCLUDE index patch.
>
> Patches 2-7 are ready for committer's review.
> I found no significant problems in algorithm or implementation.
> Here are few suggestions to improve readability:
>
> 1) patch 0002.
> * Returned matched index clause exression.
> * Number of matched index column is returned in *indexcol_p.
>
> Typos in comment:
> exPression
> index columnS

"exPression" is fixed.
But there should be "column" because only single index column is matched.

> 2) patch 0002.
> + /*
> + * We allow any column of this index to match each pathkey; they
> + * don't have to match left-to-right as you might expect.
> + */
>
> Before refactoring this comment had a line about gist and sp-gist specific:
>
> - * We allow any column of the index to match each pathkey; they
> - * don't have to match left-to-right as you might expect. This is
> - * correct for GiST, and it doesn't matter for SP-GiST because
> - * that doesn't handle multiple columns anyway, and no other
> - * existing AMs support amcanorderbyop. We might need different
> - * logic in future for other implementations.
>
> Now, when the code was moved to a separate function, it is not clear why the
> same logic is ok for b-tree and other index methods. If I got it right, it
> doesn't affect the correctness of the algorithm, because b-tree specific
> checks are performed in another function. Still it would be better to
> explain it in the comment for future readers.

It seems that match_orderbyop_pathkey() is simply the wrong place for this
comment. I moved it into match_orderbyop_pathkeys() which is implementation of
ammatchorderby() for GiST an SP-GiST. Also I added old sentence about its
correctness for GiST and SP-GiST.

> 3) patch 0004
> if (!so->distanceTypeByVal)
> {
> so->state.currDistance = PointerGetDatum(NULL);
> so->state.markDistance = PointerGetDatum(NULL);
> }
>
> Why do we reset these fields only for !distanceTypeByVal?

These fields should be initialized (it is initialization, not reset) only for
by-ref types because before writing a new distance values to these fields,
the previous by-ref values are pfreed. The corresponding comment was added.

> 4) patch 0004
> + * _bt_next_item() -- Advance to next tuple on current page;
> + * or if there's no more, try to step to the next page with data.
> + *
> + * If there are no more matching records in the given direction
> */
>
> Looks like the last sentence of the comment is unfinished.

Yes, "false is returned" is missing. Fixed.

> 5) patch 0004
> _bt_start_knn_scan()
>
> so->currRightIsNearest = _bt_compare_current_dist(so, rstate, lstate);
> /* Reset right flag if the left item is nearer. */
> right = so->currRightIsNearest;
>
> If this comment relates to the string above it?

No, it relates only to string below. 'right' flag determines later the selected
scan direction, so 'currRightIsNearest' should be assigned to it. This comment
was rewritten.

> 6) patch 0004
> _bt_parallel_seize()
>
> + scanPage = state == &so->state
> + ? &btscan->btps_scanPage
> + : &btscan->btps_knnScanPage;
>
> This code looks a bit tricke to me. Why do we need to pass BTScanState state
> to _bt_parallel_seize() at all? Won't it be enough to choose the page before
> function call and pass it?

If we will pass page, then we will have to pass it through the whole function
tree:
_bt_parallel_seize()
_bt_steppage()
_bt_next_item()
_bt_next_nearest()
_bt_load_first_page()
_bt_init_knn_scan()
_bt_readnextpage()
_bt_parallel_readpage()
_bt_first()

I decided simply to add flag 'isKnn' to BtScanState, so the code now looks like
this:
scanPage = state->isKnn
? &btscan->btps_scanPage
: &btscan->btps_knnScanPage;

I also can offer to add 'id' (0/1) to BtScanState instead, then the code will
look like this:
scanPage = &btscan->btps_scanPages[state->id];

> 7) patch 0006
> + <title>Upgrade notes for version 1.6</title>
> +
> + <para>
> + In version 1.6 <literal>btree_gist</literal> switched to using in-core
> + distance operators, and its own implementations were removed. References to
> + these operators in <literal>btree_gist</literal> opclasses will be updated
> + automatically during the extension upgrade, but if the user has created
> + objects referencing these operators or functions, then these objects must be
> + dropped manually before updating the extension.
>
> Is this comment still relevant after the latest changes?
> Functions are not removed, they're still in contrib.

Yes, comment is still relevant. SQL functions and operators are dropped,
but C functions remain (see [1]).

> Patches #8 and #9 need more review and tests.
> I'll try to give a feedback on them in the week.
>
> P.S. many thanks for splitting the code into separate patches. It made review a lot easier.
>
> The new status of this patch is: Waiting on Author

[1] https://www.postgresql.org/message-id/CAPpHfdstf812dYObwMeu54P5HijHgURNdoJRc3jKxRj2LsQJRg%40mail.gmail.com

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Fix-get_index_column_opclass-v09.patch.gz application/gzip 799 bytes
0002-Introduce-ammatchorderby-function-v09.patch.gz application/gzip 6.1 KB
0003-Extract-structure-BTScanState-v09.patch.gz application/gzip 10.3 KB
0004-Add-kNN-support-to-btree-v09.patch.gz application/gzip 17.3 KB
0005-Add-btree-distance-operators-v09.patch.gz application/gzip 11.7 KB
0006-Remove-distance-operators-from-btree_gist-v09.patch.gz application/gzip 10.8 KB
0007-Add-regression-tests-for-kNN-btree-v09.patch.gz application/gzip 5.0 KB
0008-Allow-ammatchorderby-to-return-pathkey-sublists-v09.patch.gz application/gzip 995 bytes
0009-Add-support-of-array-ops-to-btree-kNN-v09.patch.gz application/gzip 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2019-03-06 17:12:54 Re: [HACKERS] Incomplete startup packet errors
Previous Message Robert Haas 2019-03-06 17:04:57 Re: patch to allow disable of WAL recycling