[tsvector] to_tsvector called multiple times

From: "Sven R(dot) Kunze" <srkunze(at)tbz-pariv(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: [tsvector] to_tsvector called multiple times
Date: 2015-05-26 08:18:35
Message-ID: 55642C5B.60809@tbz-pariv.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi everybody,

the following stemming results made me curious:

select to_tsvector('german', 'systeme'); > 'system':1
select to_tsvector('german', 'systemes'); > 'system':1
select to_tsvector('german', 'systems'); > 'system':1
select to_tsvector('german', 'systemen'); > 'system':1
select to_tsvector('german', 'system'); > 'syst':1

First of all, this seems to be a bug in the German stemmer. Where can I
fix it?

Second, and more importantly, as I understand it, the stemmed version of
a word should be considered normalized. That is, all other versions of
that stem should be mapped to it as well. The interesting problem here
is that PostgreSQL maps the stem itself ('system') to a completely
different stem ('syst').

Should a stem not remain stable even when to_tsvector is called on it
multiple times?

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze(at)tbz-pariv(dot)de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Albe Laurenz 2015-05-26 09:01:44 Re: [tsvector] to_tsvector called multiple times
Previous Message Piotr Gasidło 2015-05-26 07:45:56 Re: Replacing uuid-ossp with uuid-freebsd