Re: [HACKERS] [PATCH] kNN for SP-GiST

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: Nikita Glukhov <n(dot)gluhov(at)postgrespro(dot)ru>, David Steele <david(at)pgmasters(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] kNN for SP-GiST
Date: 2018-08-30 09:17:44
Message-ID: CAPpHfdvGtEzwRHqOUM67ixvqLo+ZbGhV+KEk_YMLZZM_WTCy0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Tue, Aug 28, 2018 at 12:50 PM Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:
> On Thu, Jul 26, 2018 at 8:39 PM Andrey Borodin <x4mmm(at)yandex-team(dot)ru> wrote:
> > I'm not sure in architectural point of view: supporting two ways (list and heap) to store result seems may be a bit heavy, but OK. At least, it has meaningful benefits.
>
> It seems that Nikita have reworked it that way after my suspect that
> switching regular scans to pairing heap *might* cause a regression.
> However, I didn't insist that we need to support two ways. For
> instance, if we can prove that there is no regression then it's fine
> to use just heap...

So, I decided to check does it really matter to support both list and
pairing heap? Or can we just always use pairing heap?

I wrote following simple test.

Points are uniformly distributed in box (0,0)-(1,1).
# create table spgist_test as (select point(random(), random()) p from
generate_series(1,1000000) i);
# create index spgist_test_idx on spgist_test using spgist(p);

spgist_bench() walks over this box with given step and queries it each
time with boxes of given size.
CREATE FUNCTION spgist_bench(step float8, size float8) RETURNS void AS $$
DECLARE
x float8;
y float8;
BEGIN
y := 0.0;
WHILE y <= 1.0 LOOP
x := 0.0;
WHILE x <= 1.0 LOOP
PERFORM * FROM spgist_test WHERE p <@ box(point(x,y),
point(x+size, y+size));
x := x + step;
END LOOP;
y := y + step;
END LOOP;
END;
$$ LANGUAGE plpgsql;

It fist I've compared current patch with modified patch, which I made
to to always queue scan.

# Current patch (use list)
x run 1 run 2 run 3
0.1 1206 1230 1231
0.01 2862 2813 2802
0.003 13915 13911 13900

# Always use queue
x run 1 run 2 run 3
0.1 1238 1189 1170
0.01 2740 2852 2769
0.003 13627 13751 13757

And it appears that difference is less than statistical error. But
then I compared that with master. And it appears that master is much
faster.

# Master
x run 1 run 2 run 3
0.1 725 691 700
0.01 1488 1480 1495
0.003 6696 6210 6295

It's probably because we're anyway allocating separate queue memory
context. I'll explore this more and come up with details.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gunnlaugur Thor Briem 2018-08-30 09:24:13 Re: pg_upgrade fails saying function unaccent(text) doesn't exist
Previous Message Yugo Nagata 2018-08-30 09:10:54 Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)