From: | "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com> |
---|---|
To: | "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: n-gram search function |
Date: | 2007-02-18 22:41:16 |
Message-ID: | 1d4e0c10702181441o106f7998p1bd7e5d59a86754@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Oleg,
On 2/17/07, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
> 3-gram is implemented as a contrib/pg_trgm. It currently uses GiST index,
> but may be enhanced with the GiN.
As I'm facing the same problem, I've taken a look to pg_trgm. At the
moment, my opinion is quite mixed but perhaps I did something wrong.
I have a table (100k rows) with a location name in it generally
composed of several words but not that long. I created the index
directly on this column (ie I don't create a table with each word of
the location name). Then I tried a few queries.
Here is an example:
prod=# explain analyze select nomlieu from lieu where nomlieu ilike '%gaumont%';
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Seq Scan on lieu (cost=0.00..7230.20 rows=7 width=21) (actual
time=7.768..556.930 rows=39 loops=1)
Filter: ((nomlieu)::text ~~* '%gaumont%'::text)
Total runtime: 557.066 ms
(3 rows)
_prod=# explain analyze select nomlieu from lieu where nomlieu % 'gaumont';
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on lieu (cost=3.37..200.80 rows=106 width=21)
(actual time=689.799..690.035 rows=36 loops=1)
Recheck Cond: ((nomlieu)::text % 'gaumont'::text)
-> Bitmap Index Scan on idx_lieu_nomlieu_trgm (cost=0.00..3.37
rows=106 width=0) (actual time=689.749..689.749 rows=36 loops=1)
Index Cond: ((nomlieu)::text % 'gaumont'::text)
Total runtime: 690.195 ms
(5 rows)
The trigram version is slower and doesn't return 3 results I should
have. The 3 results it doesn't return have the word gaumont in them at
the start of the string exactly like the others.
Is there anything I can do to improve the performances and investigate
why I don't have these 3 results?
Thanks.
--
Guillaume
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2007-02-18 22:41:23 | Re: Plan invalidation design |
Previous Message | Joshua D. Drake | 2007-02-18 22:27:49 | Re: New feature request: FlashBack Query |