Re: [HACKERS] [PATCH] kNN for SP-GiST

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] kNN for SP-GiST
Date: 2018-08-30 12:57:07
Message-ID: CAPpHfdscCV83E0Ks4Hk=jEYQLA4sL--W_5ty=TT_xyZ0DL88_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 30, 2018 at 12:41 PM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> Right, performance regression appears to be caused by queue memory
> context allocation. I've tried to apply the same approach that we've
> in GiST: allocate separate memory context for queue only at second
> rescan call. And it appears to work, there is no measurable
> regression in comparison to master.
>
> Patch v8
> x run 1 run 2 run 3
> 0.1 680 660 662
> 0.01 1403 1395 1508
> 0.003 6561 6475 6223
>
> Revised version of patch is attached. I've squashed patchset into one
> patch. Later I'll split it into fewer parts before committing. I'm
> going to also make some benchmarking of KNN itself: GiST vs SP-GiST.

I've made KNN GiST vs SP-GiST benchmarking similar to what I've done
previously for plain search.

create table knn_test as (select point(random(), random()) p from
generate_series(1,1000000) i);
create index knn_test_idx on knn_test using spgist(p);
create index knn_test_idx on knn_test using gist(p);

CREATE FUNCTION knn_bench(step float8, lim int) RETURNS void AS $$
DECLARE
x float8;
y float8;
BEGIN
y := 0.0;
WHILE y <= 1.0 LOOP
x := 0.0;
WHILE x <= 1.0 LOOP
PERFORM * FROM knn_test ORDER BY p <-> point(x,y) LIMIT lim;
x := x + step;
END LOOP;
y := y + step;
END LOOP;
END;
$$ LANGUAGE plpgsql;

And the results are following.

# GiST
step limit run 1 run 2 run 3 buffers
0.1 1000 156 161 158 123191
0.03 100 298 305 301 122902
0.01 10 1216 1214 1212 138591

# SP-GiST
step limit run 1 run 2 run 3 buffers
0.1 1000 147 152 153 126446
0.03 100 227 223 226 131241
0.01 10 734 733 730 175908

For me it looks expectable. SP-GiST takes less CPU, but uses more
buffers. It's pretty same to what we have for plain scans. So, I see
no anomaly here.

Generally patch looks close to committable shape for me. I'm going to
revise code and documentation again, split it up, and then propose to
commit.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-08-30 12:58:04 Re: Problem while updating a foreign table pointing to a partitioned table on foreign server
Previous Message Erik Rijkers 2018-08-30 12:51:45 Re: rare crash - FailedAssertion snapbuild.c Line: 580