KNN-GiST with recheck

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: KNN-GiST with recheck
Date: 2014-01-13 17:17:12
Message-ID: CAPpHfdu_qBLNRnv-r=_tofZYYa6-r=Z5_MGF4_phAOkWcYxfRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers!

This patch was split from thread:
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=goN-gfA@mail.gmail.com

I've split it to separate thead, because it's related to partial sort only
conceptually not technically. Also I renamed it to "knn-gist-recheck" from
"partial-knn" as more appropriate name. In the attached version docs are
updated. Possible weak point of this patch design is that it fetches heap
tuple from GiST scan. However, I didn't receive any notes about its design,
so, I'm going to put it to commitfest.

Here goes a desription of this patch same as in original thread.

KNN-GiST provides ability to get ordered results from index, but this order
is based only on index information. For instance, GiST index contains
bounding rectangles for polygons, and we can't get exact distance to
polygon from index (similar situation is in PostGIS). In attached patch,
GiST distance method can set recheck flag (similar to consistent method).
This flag means that distance method returned lower bound of distance and
we should recheck it from heap.

See an example.

create table test as (select id, polygon(3+(random()*10)::int,
circle(point(random(), random()), 0.0003 + random()*0.001)) as p from
generate_series(1,1000000) id);
create index test_idx on test using gist (p);

We can get results ordered by distance from polygon to point.

postgres=# select id, p <-> point(0.5,0.5) from test order by p <->
point(0.5,0.5) limit 10;
id | ?column?
--------+----------------------
755611 | 0.000405855808916853
807562 | 0.000464123777564343
437778 | 0.000738524708741959
947860 | 0.00076250998760724
389843 | 0.000886362723569568
17586 | 0.000981960100555216
411329 | 0.00145338112316853
894191 | 0.00149399559703506
391907 | 0.0016647896049741
235381 | 0.00167554614889509
(10 rows)

It's fast using just index scan.

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.29..1.86 rows=10 width=36) (actual time=0.180..0.230
rows=10 loops=1)
-> Index Scan using test_idx on test (cost=0.29..157672.29
rows=1000000 width=36) (actual time=0.179..0.228 rows=10 loops=1)
Order By: (p <-> '(0.5,0.5)'::point)
Total runtime: 0.305 ms
(4 rows)

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
knn-gist-recheck-1.patch.gz application/x-gzip 7.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2014-01-13 17:30:27 Re: nested hstore patch
Previous Message Alexander Korotkov 2014-01-13 17:07:24 Re: GIN improvements part 1: additional information