Re: integrated tsearch has different results than tsearch2

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: integrated tsearch has different results than tsearch2
Date: 2007-09-03 08:46:25
Message-ID: Pine.LNX.4.64.0709031245430.2767@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Pavel,

I can't read your posting. Can you use plain text format ?

Oleg
On Mon, 3 Sep 2007, Pavel Stehule wrote:

> Hello
> I am testing fulltext.
> 1. I am not able use fulltext with latin2 encoding :( I missing noteabout only utf8 dictionaries in doc).
>
> 2. with hspell dictionaries (fresh copy from open office) I gotdifferent and wrong results.
> Original (old) result
> ts=# select * from ts_debug('P??li? ?lu?ou?k? k?? se napil ?lut? vody'); ts_name | tok_type | description | token | dict_name | tsvector --------------+----------+-------------+-----------+-------------------+ ------------ default_czech | word | Word | P??li? |{cz_ispell,simple} | 'p??li?' default_czech | word | Word | ?lu?ou?k? |{cz_ispell,simple} | '?lu?ou?k?' default_czech | word | Word | k?? | {cz_ispell,simple} | 'k??' default_czech | lword | Latin word | se | {cz_ispell,simple} | default_czech | lword | Latin word | napil |{cz_ispell,simple} | 'nap?t' default_czech | word | Word | ?lut? |{cz_ispell,simple} | '?lut?' default_czech | lword | Latin word | vody |{cz_ispell,simple} | 'voda' (7 ??dek)
> New results:postgres=# create Text search dictionary cspell(template=ispell,dictfile=czech, afffile=czech, stopwords=czech);CREATE TEXT SEARCH DICTIONARYpostgres=# CREATE text search configuration cs (copy=english);CREATE TEXT SEARCH CONFIGURATION
> postgres=# alter text search configuration cs alter mapping for word,lword with cspell, simple;ALTER TEXT SEARCH CONFIGURATIONpostgres=# select * from ts_debug('cs','P??li? ?lu?ou?k? k?? se napil?lut? vody'); Alias | Description | Token | Dictionaries | Lexized token-------+---------------+-----------+-----------------+--------------------- word | Word | P??li? | {cspell,simple} | cspell: {p??li?} blank | Space symbols | | {} | word | Word | ?lu?ou?k? | {cspell,simple} | cspell: {?lu?ou?k?} blank | Space symbols | | {} | word | Word | k?? | {cspell,simple} | cspell: {k??} blank | Space symbols | | {} | lword | Latin word | se | {cspell,simple} | cspell: {} blank | Space symbols | | {} | lword | Latin word | napil | {cspell,simple} | simple: {napil} blank | Space symbols | | {} | word | Word | ?lut? | {cspell,simple} | simple: {?lut?} blank | Space symbols | | {} | lword | Latin word | vody | {cspell,simple} | simple: {vody}(13 rows)
> This query returned true in 8.2 and now:
> postgres=# select to_tsvector('cs','P??li? ?lut? k?? se napil ?lut?vody') @@ to_tsquery('cs','nap?t'); ?column?---------- f(1 row)
> RegardsPavel Stehule
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Decibel! 2007-09-03 09:09:53 Re: Per-function GUC settings: trickier than it looked
Previous Message Heikki Linnakangas 2007-09-03 08:00:10 Re: [PATCH] Lazy xid assingment V2