Re: [PROPOSAL] Improvements of Hunspell dictionaries support

From: Emre Hasegeli <emre(at)hasegeli(dot)com>
To: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Date: 2015-11-07 14:20:28
Message-ID: CAE2gYzwom3=11U9G8ZxMT5PLkZrwb12BWzxh4dB3HUd89FOSrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for working on this.

I tried the patch with a Turkish dictionary [1] I could find on the
Internet. It worked for some words, but not others:

> hasegeli=# create text search dictionary hunspell_tr (template = ispell, dictfile = tr, afffile = tr);
> CREATE TEXT SEARCH DICTIONARY
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilki'); -- The root "fox"
> -----------
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkinin'); -- Genitive form, affix 3290
> ts_lexize
> -----------
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkiler'); -- Plural form, affix 4371
> ts_lexize
> -----------
> {tilki}
> (1 row)
>
> hasegeli=# select ts_lexize('hunspell_tr', 'tilkiyi'); -- Accusative form, affix 2646
> ts_lexize
> -----------
>
> (1 row)

It seems to have something to do with the order of the affixes. It
works, if I move affix 2646 to the beginning of the list.

[1] https://tr-spell.googlecode.com/files/dict_aff_5000_suffix_1130000_words.zip

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vitaly Burovoy 2015-11-07 14:47:17 Extracting fields from 'infinity'::TIMESTAMP[TZ]
Previous Message Amit Kapila 2015-11-07 14:16:35 Re: Transactions involving multiple postgres foreign servers