Skip site navigation (1) Skip section navigation (2)

Re: Fast tsearch2, trigram matching on short phrases

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Fast tsearch2, trigram matching on short phrases
Date: 2007-08-22 21:09:52
Message-ID: 0BC8501AC4E64C528432FBE619AA59AD@serenity (view raw or flat)
Thread:
Lists: pgsql-performance
>> The problem is in idea, not in performance.

Oh, I think we both agree on that! ;-D 

This is why I didn't post any EXPLAINs or anything like that. I thought the
problem was in the entire method of how to best zero in on the set of
records best suited for closer analysis by my phrase-matching function.

Since you asked about an EXPLAIN ANALYZE, I put together some results for
for you. I added a pg_trgm index to the table to show the performance of
GiST indexes, and did another based on exclusively on similarity(). But I
don't think that you will be surprised by what you see.

As you say, the problem is in the idea - but no matter what, I need to be
able to match phrases that will have all sorts of erratic abbreviations and
misspellings - and I have to do it at very high speeds.

I would appreciate any suggestions you might have.

Carlo



select similarity('veterans''s affairs', name) as sim, name
from institution 
where name % 'veterans''s affairs'
order by sim desc

Sort  (cost=4068.21..4071.83 rows=1446 width=23) (actual
time=4154.962..4155.006 rows=228 loops=1)
  Sort Key: similarity('veterans''s affairs'::text, (name)::text)
  ->  Bitmap Heap Scan on institution  (cost=75.07..3992.31 rows=1446
width=23) (actual time=4152.825..4154.754 rows=228 loops=1)
        Recheck Cond: ((name)::text % 'veterans''s affairs'::text)
        ->  Bitmap Index Scan on institution_name_trgm_idx
(cost=0.00..74.71 rows=1446 width=0) (actual time=4152.761..4152.761
rows=228 loops=1)
              Index Cond: ((name)::text % 'veterans''s affairs'::text)
Total runtime: 4155.127 ms

select name
from institution 
where 
   similarity('veterans''s affairs', name) > 0.5
order by similarity('veterans''s affairs', name) > 0.5

Sort  (cost=142850.08..144055.17 rows=482036 width=23) (actual
time=12603.745..12603.760 rows=77 loops=1)
  Sort Key: (similarity('veterans''s affairs'::text, (name)::text) >
0.5::double precision)
  ->  Seq Scan on institution  (cost=0.00..97348.81 rows=482036 width=23)
(actual time=2032.439..12603.370 rows=77 loops=1)
        Filter: (similarity('veterans''s affairs'::text, (name)::text) >
0.5::double precision)
Total runtime: 12603.818 ms



In response to

pgsql-performance by date

Next:From: Farhan MughalDate: 2007-08-22 21:58:40
Subject: Re: Long running transaction in pg_activity_log
Previous:From: Tobias BroxDate: 2007-08-22 21:08:58
Subject: Re: Long running transaction in pg_activity_log

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group