Re: [to_tsvector] German Compound Words

From: "Sven R(dot) Kunze" <srkunze(at)tbz-pariv(dot)de>
To: obartunov(at)gmail(dot)com
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: [to_tsvector] German Compound Words
Date: 2015-05-28 15:34:44
Message-ID: 55673594.2050403@tbz-pariv.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Sure. Here you are:

=# select ts_debug('public.german_compound', 'wasserkraft');
ts_debug
-----------------------------------------------------------------------------------------------------
(asciiword,"Word, all
ASCII",wasserkraft,"{german_hunspell,german_stem}",german_stem,{wasserkraft})

=# select ts_debug('public.german_compound', 'schifffahrt');
ts_debug
---------------------------------------------------------------------------------------------------------
(asciiword,"Word, all
ASCII",schifffahrt,"{german_hunspell,german_stem}",german_hunspell,{schifffahrt})

=# select ts_debug('public.german_compound', 'blindflansch');
ts_debug
-------------------------------------------------------------------------------------------------------
(asciiword,"Word, all
ASCII",blindflansch,"{german_hunspell,german_stem}",german_stem,{blindflansch})

That is my testing configuration:

=# \dF+ german_compound
Text search configuration "public.german_compound"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+-----------------------------
asciihword | german_hunspell,german_stem
asciiword | german_hunspell,german_stem
email | simple
file | simple
float | simple
host | simple
hword | german_hunspell,german_stem
hword_asciipart | german_hunspell,german_stem
hword_numpart | simple
hword_part | german_hunspell,german_stem
int | simple
numhword | simple
numword | simple
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | german_hunspell,german_stem

On 28.05.2015 17:24, Oleg Bartunov wrote:
> ts_debug() ?
>
> =# select * from ts_debug('english', 'messages');
> alias | description | token | dictionaries | dictionary
> | lexemes
> -----------+-----------------+----------+----------------+--------------+----------
> asciiword | Word, all ASCII | messages | {english_stem} |
> english_stem | {messag}
>
>
> On Thu, May 28, 2015 at 2:05 PM, Sven R. Kunze <srkunze(at)tbz-pariv(dot)de
> <mailto:srkunze(at)tbz-pariv(dot)de>> wrote:
>
> Hi everybody,
>
> what do I need to do in order to enable compound word handling in
> PostgreSQL tsvector implementation?
>
> I run an Ubuntu 14.04 machine, PostgreSQL 9.3, have installed
> package hunspell-de-de and already created a new dictionary as
> described here:
> http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY
>
> CREATE TEXT SEARCH DICTIONARY german_hunspell (
> TEMPLATE = ispell,
> DictFile = de_de,
> AffFile = de_de,
> StopWords = german
> );
>
> Furthermore, created a new test text search configuration (copied
> from german) and updated all parser parts where the german_stem
> dictionary is used so that it uses german_hunspell first and then
> german_stem.
>
> However, ts_vector still does not work for the compound words such as:
>
> wasserkraft -> wasserkraft, kraft
> schifffahrt -> schifffahrt, fahrt
> blindflansch -> blindflansch, flansch
>
> etc.
>
>
> What have I done wrong here?
>
> --
> Sven R. Kunze
> TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
> Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
> e-mail: srkunze(at)tbz-pariv(dot)de <mailto:srkunze(at)tbz-pariv(dot)de>
> web: www.tbz-pariv.de <http://www.tbz-pariv.de>
>
> Geschäftsführer: Dr. Reiner Wohlgemuth
> Sitz der Gesellschaft: Chemnitz
> Registergericht: Chemnitz HRB 8543
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org
> <mailto:pgsql-general(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
>

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze(at)tbz-pariv(dot)de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rémi Cura 2015-05-28 15:54:02 Re: Python 3.2 XP64 and Numpy...
Previous Message Oleg Bartunov 2015-05-28 15:24:29 Re: [to_tsvector] German Compound Words