Re: hunspell and tsearch2 ?

From: Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: hunspell and tsearch2 ?
Date: 2012-08-31 13:07:24
Message-ID: 5040B70C.70805@thinkproject.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

there is a note in the pg documentation chapter

12.6.5 Ispell Dictionary

*Note:*MySpell does not support compound words. Hunspell has
sophisticated support for compound words. At present, PostgreSQL
implements only the basic compound word operations of Hunspell.

Regards
Dirk

On 08/30/2012 05:39 PM, Robert Haas wrote:
> On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
> <dirk(dot)lutzebaeck(at)thinkproject(dot)com> wrote:
>> we have issues with compound words in tsearch2 using the german (ispell)
>> dictionary. This has been discussed before but there is no real solution
>> using the recommended german dictionary at
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
>> openoffice dict file to ispell suitable for tsearch):
>>
>> # select ts_lexize('german_ispell', 'vollklimatisiert');
>> ts_lexize
>> --------------------
>> {vollklimatisiert}
>> (1 row)
>>
>> This should return atleast
>>
>> {vollklimatisiert, voll, klimatisiert}
>>
>>
>> The issue with compound words in ispell has been addressed in hunspell. But
>> this has not been integrated fully to tsearch2 (according to the
>> documentation).
> Just out of curiosity, which part of the documentation are you looking
> at? The only mention of hunspell I see in the documentation is a
> mention that we apparently support their dictionary-file format.
>
>> Are there any plans to fully integrate hunspell into tsearch2? What is
>> needed to do this? What is the functional delta which is missing? Maybe we
>> can help...

--

Mit freundlichen Grüßen / Best regards,

*think project! International GmbH & Co. KG*

Dirk Lutzebäck
Geschäftsführer / Managing Director, CTO

Tel +49 30 921 017 90
Fax +49 30 921 017 50
dirk(dot)lutzebaeck(at)thinkproject(dot)com

Rechtliche Informationen zum Absender (Impressum):
www.thinkproject.com/de/info <http://www.thinkproject.com/de/info>

Legal information (imprint): www.thinkproject.com/en/info
<http://www.thinkproject.com/en/info>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2012-08-31 14:14:42 Re: compiler barriers (was: New statistics for WAL buffer dirty writes)
Previous Message Pavel Stehule 2012-08-31 12:26:22 Re: patch: shared session variables