Re: tsearch2 problem

From: "Jodok Batlogg" <jodok(at)lovelysystems(dot)com>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: pgsql-general(at)postgresql(dot)org, Jürgen Kartnaller <juergen(at)lovelysystems(dot)com>
Subject: Re: tsearch2 problem
Date: 2008-10-31 10:30:09
Message-ID: 47b22fd00810310330n7fc6ca61i15964f7de32038e0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

hi oleg,

thanks for your quick response,

2008/10/31 Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
> Jodok,
>
> you got what's you defined. Please, read documentation.
> In short, word doesn't indexed if it is not recognized by any
> dictionaried from stack of dictionaries. Put stemming dictionary at the end,
> which recognizes everything.

can you point me to "the" documentation where i could find that? i
think i tried hard :)

however - problem a) is fixed. thanks :)
nevertheless i still have the problem that words with '/' are beeing
interpreted as file paths instead of words. any idea how i could tweak
this?

thanks

jodok

>
> Oleg
> On Fri, 31 Oct 2008, Jodok Batlogg wrote:
>
>> we're using tsearch2 with the german dictionary
>>
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>> for fulltext search.
>>
>> the indexing is configured as follows:
>>
>> CREATE TEXT SEARCH DICTIONARY public.german (
>> TEMPLATE = ispell,
>> DictFile = german,
>> AffFile = german,
>> StopWords = german
>> );
>>
>> CREATE TEXT SEARCH CONFIGURATION public.default ( COPY = pg_catalog.german
>> );
>>
>> ALTER TEXT SEARCH CONFIGURATION public.default
>> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>> word, hword, hword_part
>> WITH public.german;
>>
>> -------------------------
>>
>> select * from ts_debug('default', 'hundshЪЪtte');
>> works as expected: creates the two lexemes: "{hund,hЪЪtte}"
>>
>> BUT
>>
>> SELECT to_tsvector('default','lovely und bauarbeiter/in');
>> looses a lot of stuff:
>> "'bauarbeiter/in':2"
>>
>> some more debugging shows:
>>
>> SELECT * from ts_debug('default','lovely und bauarbeiter/in');
>>
>> "asciiword";"Word, all ASCII";"lovely";"{german}";"german";""
>> "blank";"Space symbols";" ";"{}";"";""
>> "asciiword";"Word, all ASCII";"und";"{german}";"german";"{}"
>> "blank";"Space symbols";" ";"{}";"";""
>> "file";"File or path
>> name";"bauarbeiter/in";"{simple}";"simple";"{bauarbeiter/in}"
>>
>> a) unknown words are just beeing dropped
>> b) words with slashes are interpreted as file paths and the first path
>> is beeing dropped.
>>
>> any idea how we can fix this?
>>
>> jodok
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

--
Jodok Batlogg, Vorstand

Lovely Systems AG
Telefon +43 5572 908060, Fax +43 5572 908060-77, Mobil +43 664 9636963
Schmelzhütterstraße 26a, 6850 Dornbirn, Austria

Sitz: Dornbirn, FB: Landesgericht Feldkirch, FN: 208859x, UID: ATU51736705
Aufsichtsratsvorsitzender: Christian Lutz
Vorstand: Jodok Batlogg, Manfred Schwendinger

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Ivan Sergio Borgonovo 2008-10-31 10:37:25 Re: tsearch2 problem
Previous Message Oleg Bartunov 2008-10-31 10:10:20 Re: tsearch2 problem