From: | Jens Sauer <jsauer65(at)googlemail(dot)com> |
---|---|
To: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: fulltext search and hunspell |
Date: | 2011-02-07 23:16:34 |
Message-ID: | AANLkTi=CtTJ6LPVshqHHjV+rx1GRq+fKFZJvrAMULpdT@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hey,
thanks for your answer.
First I checked the links in the tsearch_data directory
de_de.affix, and de_de.dict are symlinks to the corresponding files in
/var/cache/postgresql/dicts/
Then I recreated them by using pg_updatedicts.
This is an extract of the de_de.affix file:
# this is the affix file of the de_DE Hunspell dictionary
# derived from the igerman98 dictionary
#
# Version: 20091006 (build 20100127)
#
# Copyright (C) 1998-2009 Bjoern Jacke <bjoern(at)j3e(dot)de>
#
# License: GPLv2, GPLv3 or OASIS distribution license agreement
# There should be a copy of both of this licenses included
# with every distribution of this dictionary. Modified
# versions using the GPL may only include the GPL
SET ISO8859-1
TRY esijanrtolcdugmphbyfvkwqxzäüößáéêàâñESIJANRTOLCDUGMPHBYFVKWQXZÄÜÖÉ-.
PFX U Y 1
PFX U 0 un .
PFX V Y 1
PFX V 0 ver .
SFX F Y 35
[...]
I cannot find "compoundwords controlled z" there, so I manually added it.
[...]
# versions using the GPL may only include the GPL
compoundwords controlled z
SET ISO8859-1
TRY esijanrtolcdugmphbyfvkwqxzäüößáéêàâñESIJANRTOLCDUGMPHBYFVKWQXZÄÜÖÉ-.
[...]
Then I restarted PostgreSQL.
Now I get an error:
SELECT * FROM ts_debug('Schokoladenfabrik');
FEHLER: falsches Affixdateiformat für Flag
CONTEXT: Zeile 18 in Konfigurationsdatei
»/usr/share/postgresql/8.4/tsearch_data/de_de.affix«: »PFX U Y 1
«
SQL-Funktion »ts_debug« Anweisung 1
SQL-Funktion »ts_debug« Anweisung 1
Which means:
ERROR: wrong Affixfileformat for flag
CONTEXT: Line 18 in Configuration ...
If I add
COMPOUNDFLAG Z
ONLYINCOMPOUND L
instead of "compoundwords controlled z"
I didn't get an error:
SELECT * FROM ts_debug('Schokoladenfabrik');
alias | description | token |
dictionaries | dictionary | lexemes
-----------+-----------------+-------------------+-------------------------------+-------------+-------------------
asciiword | Word, all ASCII | Schokoladenfabrik |
{german_hunspell,german_stem} | german_stem | {schokoladenfabr}
(1 row)
But it seems that the hunspell dictionary is not working for compound words.
Maybe pg_updatedicts has a bug and generates affix files in the wrong format?
Jens
2011/2/7 Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>:
> Jens,
>
> could you check affix file for
> compoundwords controlled z
>
> also, can you provide link to dictionary files, so we can check if they
> supported, since we have only rudiment support of hunspell.
> btw,it'd be nice to have output from ts_debug() to make sure dictionaries
> actually used.
>
> Oleg
From | Date | Subject | |
---|---|---|---|
Next Message | akp geek | 2011-02-08 00:12:20 | reindexing |
Previous Message | Jorge Arévalo | 2011-02-07 22:12:18 | How to improve this query? |