Re: questions about tsearch2 (for czech language)

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pavel Stehule <stehule(at)kix(dot)fsv(dot)cvut(dot)cz>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: questions about tsearch2 (for czech language)
Date: 2003-12-22 11:05:59
Message-ID: Pine.GSO.4.58.0312221401080.14104@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, 22 Dec 2003, Pavel Stehule wrote:

> Hello
>
> I try tsearch2 within czech environment. It is works fine, but I have two
> questions.
>
> 1. I have words "se", "ve" in my czech stop words. But I get this words in
> result. Why? Have I problem with my configuration?

did you specify stop words in dictionaries configuration ?

select * from pg_ts_dict;

>
> tsearch2=# select * from ts_debug('jmenuji se Pavel StЛhule a bydlМm ve
> Skalici.');
> ts_name | tok_type | description | token | dict_name | tsvector
> ---------------+----------+-------------+---------+-------------+-----------
> default_czech | lword | Latin word | jmenuji | {cz_ispell} |
> 'jmenuji'
> default_czech | lword | Latin word | se | {cz_ispell} | 'se'
> default_czech | lword | Latin word | Pavel | {cz_ispell} | 'pavel'
> default_czech | word | Word | StЛhule | {cz_ispell} |
> default_czech | lword | Latin word | a | {cz_ispell} |
> default_czech | word | Word | bydlМm | {cz_ispell} | 'bydlet'
> default_czech | lword | Latin word | ve | {cz_ispell} | 've'
> default_czech | lword | Latin word | Skalici | {cz_ispell} |
> 'skalici'
> (8 ЬАdek)
>
> tsearch2=# select * from pg_ts_cfgmap where ts_name='default_czech';
> ts_name | tok_alias | dict_name
> ---------------+--------------+-------------
> default_czech | email | {simple}
> default_czech | file | {simple}
> default_czech | float | {simple}
> default_czech | host | {simple}
> default_czech | hword | {cz_ispell}
> default_czech | int | {simple}
> default_czech | lhword | {cz_ispell}
> default_czech | lpart_hword | {cz_ispell}
> default_czech | lword | {cz_ispell}
> default_czech | nlhword | {cz_ispell}
> default_czech | nlpart_hword | {cz_ispell}
> default_czech | nlword | {cz_ispell}
> default_czech | part_hword | {simple}
> default_czech | sfloat | {simple}
> default_czech | uint | {simple}
> default_czech | uri | {simple}
> default_czech | url | {simple}
> default_czech | version | {simple}
> default_czech | word | {cz_ispell}
> (19 ЬАdek)
>
> 2. I use small czech dictionary. I need don't erase words which aren't in
> dictionary (in my sample StЛhule). Can I set it somewhere? I tryed add
> simple dict into cfg map, but witout sucess
>

Example, please ! What do you mean 'erase words' ?

> tsearch2=# select * from ts_debug('jmenuji se Pavel StЛhule a bydlМm ve
> Skalici.'); ts_name | tok_type | description | token |
> dict_name | tsvector
> ---------------+----------+-------------+---------+--------------------+-----------
> default_czech | word | Word | StЛhule | {cz_ispell,simple} |
> default_czech | lword | Latin word | a | {cz_ispell,simple} |
> default_czech | word | Word | bydlМm | {cz_ispell,simple} |
> 'bydlet'
>
>
> Thank You
> Pavel Stehule
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message javier garcia - CEBAS 2003-12-22 11:18:21 extracting date FROM timestamp
Previous Message Tony (Unihost) 2003-12-22 10:57:43 Tables Referencing themselves As Foreign Keys