Re: tsearch2 problem

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Jodok Batlogg <jodok(at)lovelysystems(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Jürgen Kartnaller <juergen(at)lovelysystems(dot)com>
Subject: Re: tsearch2 problem
Date: 2008-10-31 11:31:47
Message-ID: Pine.LNX.4.64.0810311425570.15810@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 31 Oct 2008, Jodok Batlogg wrote:

> hi oleg,
>
> thanks for your quick response,
>
> 2008/10/31 Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
>> Jodok,
>>
>> you got what's you defined. Please, read documentation.
>> In short, word doesn't indexed if it is not recognized by any
>> dictionaried from stack of dictionaries. Put stemming dictionary at the end,
>> which recognizes everything.
>
> can you point me to "the" documentation where i could find that? i
> think i tried hard :)

well, it's not really hard
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html

"A text search configuration binds a parser together with a set of
dictionaries to process the parser's output tokens. For each token type that
the parser can return, a separate list of dictionaries is specified by the
configuration. When a token of that type is found by the parser, each
dictionary in the list is consulted in turn, until some dictionary recognizes
it as a known word. If it is identified as a stop word, or if no dictionary
recognizes the token, it will be discarded and not indexed or searched for.
The general rule for configuring a list of dictionaries is to place first
the most narrow, most specific dictionary, then the more general dictionaries,
finishing with a very general dictionary, like a Snowball stemmer or simple,
which recognizes everything."

>
> however - problem a) is fixed. thanks :)
> nevertheless i still have the problem that words with '/' are beeing
> interpreted as file paths instead of words. any idea how i could tweak
> this?

several ways:
1. use your own parser
2. use encode/decode functions, which cheat default parser. For example,
encodeslash('aa/bb') -> aaxxxxxxbb. But then you should understand, that
dictionary like ispell will not be able to recognize it.

>
> thanks
>
> jodok
>
>>
>> Oleg
>> On Fri, 31 Oct 2008, Jodok Batlogg wrote:
>>
>>> we're using tsearch2 with the german dictionary
>>>
>>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>>> for fulltext search.
>>>
>>> the indexing is configured as follows:
>>>
>>> CREATE TEXT SEARCH DICTIONARY public.german (
>>> TEMPLATE = ispell,
>>> DictFile = german,
>>> AffFile = german,
>>> StopWords = german
>>> );
>>>
>>> CREATE TEXT SEARCH CONFIGURATION public.default ( COPY = pg_catalog.german
>>> );
>>>
>>> ALTER TEXT SEARCH CONFIGURATION public.default
>>> ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>>> word, hword, hword_part
>>> WITH public.german;
>>>
>>> -------------------------
>>>
>>> select * from ts_debug('default', 'hundshЪЪtte');
>>> works as expected: creates the two lexemes: "{hund,hЪЪtte}"
>>>
>>> BUT
>>>
>>> SELECT to_tsvector('default','lovely und bauarbeiter/in');
>>> looses a lot of stuff:
>>> "'bauarbeiter/in':2"
>>>
>>> some more debugging shows:
>>>
>>> SELECT * from ts_debug('default','lovely und bauarbeiter/in');
>>>
>>> "asciiword";"Word, all ASCII";"lovely";"{german}";"german";""
>>> "blank";"Space symbols";" ";"{}";"";""
>>> "asciiword";"Word, all ASCII";"und";"{german}";"german";"{}"
>>> "blank";"Space symbols";" ";"{}";"";""
>>> "file";"File or path
>>> name";"bauarbeiter/in";"{simple}";"simple";"{bauarbeiter/in}"
>>>
>>> a) unknown words are just beeing dropped
>>> b) words with slashes are interpreted as file paths and the first path
>>> is beeing dropped.
>>>
>>> any idea how we can fix this?
>>>
>>> jodok
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2008-10-31 11:40:29 Re: tsearch2 problem
Previous Message Patricio Mora 2008-10-31 11:00:53 Bad behaviour in Sun Cluster