Skip site navigation (1) Skip section navigation (2)

Re: tsearch2 problem

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Jodok Batlogg <jodok(at)lovelysystems(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Jürgen Kartnaller <juergen(at)lovelysystems(dot)com>
Subject: Re: tsearch2 problem
Date: 2008-10-31 11:31:47
Message-ID: Pine.LNX.4.64.0810311425570.15810@sn.sai.msu.ru (view raw or flat)
Thread:
Lists: pgsql-general
On Fri, 31 Oct 2008, Jodok Batlogg wrote:

> hi oleg,
>
> thanks for your quick response,
>
> 2008/10/31 Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
>> Jodok,
>>
>> you got what's you defined. Please, read documentation.
>> In short, word doesn't indexed if it is not recognized by any
>> dictionaried from stack of dictionaries. Put stemming dictionary at the end,
>> which recognizes everything.
>
> can you point me to "the" documentation where i could find that? i
> think i tried hard :)

well, it's not really hard
http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html

"A text search configuration binds a parser together with a set of 
dictionaries to process the parser's output tokens. For each token type that 
the parser can return, a separate list of dictionaries is specified by the 
configuration. When a token of that type is found by the parser, each 
dictionary in the list is consulted in turn, until some dictionary recognizes 
it as a known word. If it is identified as a stop word, or if no dictionary 
recognizes the token, it will be discarded and not indexed or searched for. 
The general rule for configuring a list of dictionaries is to place first 
the most narrow, most specific dictionary, then the more general dictionaries, 
finishing with a very general dictionary, like a Snowball stemmer or simple, 
which recognizes everything."

>
> however - problem a) is fixed. thanks :)
> nevertheless i still have the problem that words with '/' are beeing
> interpreted as file paths instead of words. any idea how i could tweak
> this?

several ways:
1. use your own parser
2. use encode/decode functions, which cheat default parser. For example,
    encodeslash('aa/bb') -> aaxxxxxxbb. But then you should understand, that
    dictionary like ispell will not be able to recognize it.


>
> thanks
>
> jodok
>
>>
>> Oleg
>> On Fri, 31 Oct 2008, Jodok Batlogg wrote:
>>
>>> we're using tsearch2 with the german dictionary
>>>
>>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/dicts/ispell/ispell-german-compound.tar.gz
>>> for fulltext search.
>>>
>>> the indexing is configured as follows:
>>>
>>> CREATE TEXT SEARCH DICTIONARY public.german (
>>>   TEMPLATE = ispell,
>>>   DictFile = german,
>>>   AffFile = german,
>>>   StopWords = german
>>> );
>>>
>>> CREATE TEXT SEARCH CONFIGURATION public.default ( COPY = pg_catalog.german
>>> );
>>>
>>> ALTER TEXT SEARCH CONFIGURATION public.default
>>>   ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>>>                     word, hword, hword_part
>>>   WITH public.german;
>>>
>>> -------------------------
>>>
>>> select * from ts_debug('default', 'hundshЪЪtte');
>>> works as expected: creates the two lexemes: "{hund,hЪЪtte}"
>>>
>>> BUT
>>>
>>> SELECT to_tsvector('default','lovely und bauarbeiter/in');
>>> looses a lot of stuff:
>>> "'bauarbeiter/in':2"
>>>
>>> some more debugging shows:
>>>
>>> SELECT * from ts_debug('default','lovely und bauarbeiter/in');
>>>
>>> "asciiword";"Word, all ASCII";"lovely";"{german}";"german";""
>>> "blank";"Space symbols";" ";"{}";"";""
>>> "asciiword";"Word, all ASCII";"und";"{german}";"german";"{}"
>>> "blank";"Space symbols";" ";"{}";"";""
>>> "file";"File or path
>>> name";"bauarbeiter/in";"{simple}";"simple";"{bauarbeiter/in}"
>>>
>>> a) unknown words are just beeing dropped
>>> b) words with slashes are interpreted as file paths and the first path
>>> is beeing dropped.
>>>
>>> any idea how we can fix this?
>>>
>>> jodok
>>>
>>>
>>
>>        Regards,
>>                Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
>
>

 	Regards,
 		Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

pgsql-general by date

Next:From: Oleg BartunovDate: 2008-10-31 11:40:29
Subject: Re: tsearch2 problem
Previous:From: Patricio MoraDate: 2008-10-31 11:00:53
Subject: Bad behaviour in Sun Cluster

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group