Quick Links

integrated tsearch has different results than tsearch2

From:	"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To:	"PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Cc:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Subject:	integrated tsearch has different results than tsearch2
Date:	2007-09-03 07:25:36
Message-ID:	162867790709030025n6448e224x6e86664316247133@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello

I am testing fulltext.

1. I am not able use fulltext with latin2 encoding :( I missing note
about only utf8 dictionaries in doc).

2. with hspell dictionaries (fresh copy from open office) I got
different and wrong results.

Original (old) result

ts=# select * from ts_debug('Příliš žluťoučký kůň se napil žluté vody');
ts_name | tok_type | description | token | dict_name
| tsvector
--------------+----------+-------------+-----------+
-------------------+ ------------
default_czech | word | Word | Příliš |
{cz_ispell,simple} | 'příliš'
default_czech | word | Word | žluťoučký |
{cz_ispell,simple} | 'žluťoučký'
default_czech | word | Word | kůň | {cz_ispell,simple} | 'kůň'
default_czech | lword | Latin word | se | {cz_ispell,simple} |
default_czech | lword | Latin word | napil |
{cz_ispell,simple} | 'napít'
default_czech | word | Word | žluté |
{cz_ispell,simple} | 'žlutý'
default_czech | lword | Latin word | vody |
{cz_ispell,simple} | 'voda'
(7 řádek)

New results:
postgres=# create Text search dictionary cspell(template=ispell,
dictfile=czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH DICTIONARY
postgres=# CREATE text search configuration cs (copy=english);
CREATE TEXT SEARCH CONFIGURATION

postgres=# alter text search configuration cs alter mapping for word,
lword with cspell, simple;
ALTER TEXT SEARCH CONFIGURATION
postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň se napil
žluté vody');
Alias | Description | Token | Dictionaries | Lexized token
-------+---------------+-----------+-----------------+---------------------
word | Word | Příliš | {cspell,simple} | cspell: {příliš}
blank | Space symbols | | {} |
word | Word | žluťoučký | {cspell,simple} | cspell: {žluťoučký}
blank | Space symbols | | {} |
word | Word | kůň | {cspell,simple} | cspell: {kůň}
blank | Space symbols | | {} |
lword | Latin word | se | {cspell,simple} | cspell: {}
blank | Space symbols | | {} |
lword | Latin word | napil | {cspell,simple} | simple: {napil}
blank | Space symbols | | {} |
word | Word | žluté | {cspell,simple} | simple: {žluté}
blank | Space symbols | | {} |
lword | Latin word | vody | {cspell,simple} | simple: {vody}
(13 rows)

This query returned true in 8.2 and now:

postgres=# select to_tsvector('cs','Příliš žlutý kůň se napil žluté
vody') @@ to_tsquery('cs','napít');
?column?
----------
f
(1 row)

Regards
Pavel Stehule

Responses

Re: integrated tsearch has different results than tsearch2 at 2007-09-03 08:46:25 from Oleg Bartunov
Re: integrated tsearch has different results than tsearch2 at 2007-09-03 10:24:31 from Teodor Sigaev

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Trevor Talbot	2007-09-03 07:26:09	Re: tsearch filenames unlikes special symbols and numbers
Previous Message	Anoo Sivadasan Pillai	2007-09-03 07:04:10	Re: FW: max_connections and shared_buffers