Re: tsearch2 problem

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Ivan Sergio Borgonovo <mail(at)webthatworks(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: tsearch2 problem
Date: 2008-10-31 11:40:29
Message-ID: Pine.LNX.4.64.0810311432430.15810@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sergio,

On Fri, 31 Oct 2008, Ivan Sergio Borgonovo wrote:

> On Fri, 31 Oct 2008 13:10:20 +0300 (MSK)
> Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>
>> Jodok,
>>
>> you got what's you defined. Please, read documentation.
>> In short, word doesn't indexed if it is not recognized by any
>> dictionaried from stack of dictionaries. Put stemming dictionary
>> at the end, which recognizes everything.
>
> Could you rephrase?
> I've a similar situation whose real solution would be to have 2+
> tsvectors (English and Italian) but that now looks too costly to
> implement.
>
> I'd like to have "proper full support" for English so that eg. it
> recognise plurals etc... and "acceptable" support for Italian so
> that if I chose something that's not in the English dictionary... at
> least it is put "as is" in the tsvector.

so, what's the problem ? Create custom configuration with dictionary stack
like ispell_en, ispell_it, english_stem.

unfortunately, stemmer is very general dictionary, so you can't have two
stemmers. But, you can always write your own dictionary, which could
call english_stem or italian_stem depending on the word. There are
several open-source language recognizers available, like textcat
http://odur.let.rug.nl/~vannoord/TextCat/, or another implementation
http://www.mnogosearch.org/guesser/

btw, it can be good contribution.

>
> I've built the tsvectors similarly to:
> setweight(
> to_tsvector('pg_catalog.english',
> coalesce(FilterCode(catalog_items.Code),'')
> ), 'A')
>
> No setup of tsearch2 was made. Just installed and started to use
> to_tsvector, to_tsquery and Co. functions.
>
> If I run Italian words through to_ts* they mostly remain as they are
> with some exceptions when there is some superposition with English.
>
> Till now it looks as an acceptable compromise but I wouldn't like to
> have surprises before I find the resources to actually do what
> should be done (fully support the 2 languages).
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Sam Mason 2008-10-31 11:49:41 Re: a LEFT JOIN problem
Previous Message Oleg Bartunov 2008-10-31 11:31:47 Re: tsearch2 problem