Re: workaround for expensive KNN?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: PostgreSQL - Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: workaround for expensive KNN?
Date: 2011-04-08 13:22:36
Message-ID: Pine.LNX.4.64.1104081721150.9772@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Probably, you miss two-columnt index. From my early post:
http://www.sai.msu.su/~megera/wiki/knngist

=# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address));
=# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist
FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point
AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10;
id | address | dist
---------+-------------------------------------------------------------+---------------------
366096 | 1st Floor Tour Eiffel | Champs de Mars, Paris 75007, France | 2.32488941293945e-05
4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406
5200167 | Champ De Mars 75007 Paris | 0.00453564562587288
9301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288
2152213 | 16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896
1923818 | Champ de Mars Paris, France | 0.00838214733539654
5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
7395870 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529
4358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339
1923742 | 12 rue du Champ de Mars Paris, France | 0.00876764731845995
(10 rows)

Time: 7.859 ms

=# EXPLAIN (COSTS OFF)
SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point
AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10;

QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Limit
-> Index Scan using spots_idx on spots
Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point) AND (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery))
(3 rows)

On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote:

> hello all ...
>
> given oleg's posting before i also wanted to fire up some KNN related question.
> let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
> i did some tests:
>
> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------
> --------------------------------------
> Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1)
> Buffers: shared hit=9 read=5004
> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actual time=
> 36391.715..45542.573 rows=10 loops=1)
> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery)
> Order By: (int_price <-> 0::bigint)
> Buffers: shared hit=9 read=5004
> Total runtime: 45542.676 ms
> (7 rows)
>
> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------
> -------------------------------------
> Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1)
> Buffers: shared hit=3 read=2316
> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actual time=
> 7243.524..10935.217 rows=10 loops=1)
> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
> Order By: (int_price <-> 0::bigint)
> Buffers: shared hit=3 read=2316
> Total runtime: 10935.265 ms
> (7 rows)
>
> test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title) @@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
> QUERY PLAN
>
> -----------------------------------------------------------------------------------------------------------------------------
> -------------------------------
> Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
> Buffers: shared hit=1 read=1577
> -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actual time=
> 28.525..28.525 rows=1 loops=1)
> Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
> Order By: (int_price <-> 0::bigint)
> Buffers: shared hit=1 read=1577
> Total runtime: 28.558 ms
> (7 rows)
>
>
> under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
> if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painful random I/O death.
> is there any alternative which does not simply die when i try to achieve what i want?
>
> the use case is quite simple: all products with a certain word (10 cheapest or so).
>
> is there any alternative approach to this?
> i was putting some hope into KNN but it seems it needs too much random I/O :(.
>
> many thanks,
>
> hans
>
> --
> Cybertec Sch?nig & Sch?nig GmbH
> Gr?hrm?hlgasse 26
> A-2700 Wiener Neustadt, Austria
> Web: http://www.postgresql-support.de
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2011-04-08 13:24:51 Re: k-neighbourhood search in databases
Previous Message Robert Haas 2011-04-08 13:01:24 Re: switch UNLOGGED to LOGGED