Re: Use of ISpell dictionaries with tsearch2 - what is

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Don Walker <don(dot)walker(at)versaterm(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Use of ISpell dictionaries with tsearch2 - what is
Date: 2006-05-01 14:30:51
Message-ID: 44561B9B.70004@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> 1. If I am correct about this then what is the point of using the ISpell
> dictionary in the first place?

Yes. The main goal of any dictionaries is a 'normalize' lexeme, ie to
get a infinitive. It's very important for languages with variable word's
form such as french, russian, norwegian etc. So, if dictionaries are
used, user don't think about exact form of word for searching.

There is two basic approaches for dictionaries: stemming and vocabulary
based. First one tries to remove variable end of word, in tsearch2 it's
a snowball dictionaries. Second is ispell - it tries to find word in
vocabulary with some grammar changes.

>
> 2. Is there a solution for correcting spelling mistakes in the documents you
> index? I have seen the readme files for pg_trgm,
> http://www.sai.msu.su/~megera/postgres/gist/, which would allow me to
> suggest other terms for a query if the misspellings were common enough. I'd
> rather fix the problem at index time so that querying with the proper term
> would find any misspelled terms (within reason).

It's possible, but it may produce unpredictable results for searching,
example from head (sorry, russian):

horosho - good ('sh' in russian is one character)
herovo - bad ( slang )

horovo - where is mistype? second character or 5-th? If we correct this
to one or both variants, user will get 'bad' for searching query 'good'...

> 2.1 Are there any canned synonym dictionaries available the deal with
> misspellings in English and/or French?
> 2.2 Are there any clever linguistic algorithms that can partly solve
> the same problem?

Ask linguists :).

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message SunWuKung 2006-05-01 14:46:33 charting performance measures with number or records in table
Previous Message chris smith 2006-05-01 14:13:04 Re: Authentication & connection problems