Re: How to switch off Snowball stemmer for tsearch2?

From: "Dmitry Koterov" <dmitry(at)koterov(dot)ru>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Postgres General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to switch off Snowball stemmer for tsearch2?
Date: 2007-08-23 09:56:46
Message-ID: d7df81620708230256m292ae23fk3aeb1c9c9e756c6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> > Now
> >
> > select lexize('ru_ispell_cp1251', 'Дмитриев') -> "Дмитрий"
> > select lexize('ru_ispell_cp1251', 'Иванов') -> "Иван"
> > - it is completely wrong!
> >
> > I have a database with all Russian name, is it possible to use it (how?)
> to
>
> if you have such database why just don't write special dictionary and
> put it in front ?

Of course because this is a database of Russian NAMES, but NOT a database of
surnames.

> make lexize() not to convert "Ivanov" to "Ivan" even if the ispell
> > dicrionary contains an element for "Ivan"? So, this pseudo-code logic is
> > needed:
> >
> > function new_lexize($string) {
> > $stem = lexize('ru_ispell_cp1251', $string);
> > if ($stem in names_database) return $string; else return $stem;
> > }
> >
> > Maybe tsearch2 implements this logic already?
>
> sure, it's how text search mapping works.

Could you please detalize?

Of course I can create all word-forms of all Russian names using ispell and
then - subtract this full list from Ispell dictionary (so I will remove
"Ivan", "Ivanami" etc. from it). But possily tsearch2 has this subtraction
algorythm already.

> Dmitry, seems your company could be my client :)

Not now, thank you. Maybe later.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Max Zorloff 2007-08-23 10:06:36 Re: CPU load high
Previous Message Thomas Kellerer 2007-08-23 09:16:46 Re: reporting tools