Re: Fast tsearch2, trigram matching on short phrases

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Fast tsearch2, trigram matching on short phrases
Date: 2007-08-22 18:48:44
Message-ID: Pine.LNX.4.64.0708222248020.2727@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 22 Aug 2007, Carlo Stonebanks wrote:

> I have read that trigram matching (similarity()) performance degrades when
> the matching is on longer strings such as phrases. I need to quickly match
> strings and rate them by similiarity. The strings are typically one to seven
> words in length - and will often include unconventional abbreviations and
> misspellings.
>
> I have a stored function which does more thorough testing of the phrases,
> including spelling correction, abbreviation translation, etc... and scores
> the results - I pick the winning score that passes a pass/fail constant.
> However, the function is slow. My solution was to reduce the number of rows
> that are passed to the function by pruning obvious mismatches using
> similarity(). However, trigram matching on phrases is slow as well.

you didn't show us explain analyze of your select.

>
> I have experimented with tsearch2 but I have two problems:
>
> 1) I need a "score" so I can decide if match passed or failed. trigram
> similarity() has a fixed result that you can test, but I don't know if rank()
> returns results that can be compared to a fixed value
>
> 2) I need an efficient methodology to create vectors based on trigrams, and a
> way to create an index to support it. My tsearch2 experiment with normal
> vectors used gist(text tsvector) and an on insert/update trigger to populate
> the vector field.
>
> Any suggestions on where to go with this project to improve performance would
> be greatly appreciated.
>
> Carlo
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2007-08-22 18:50:26 Re: Optimising "in" queries
Previous Message Steven Flatt 2007-08-22 16:55:28 Re: When/if to Reindex