Re: TSearch2 / German compound words / UTF-8

From: Alexander Presber <aljoscha(at)weisshuhn(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: TSearch2 / German compound words / UTF-8
Date: 2006-01-27 14:11:13
Message-ID: 6AC64576-AEB6-47C0-AA8C-0242F9296BEA@weisshuhn.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>> Tsearch/isepll is not able to break this word into parts, because
>> of the "s" in "Produktion/s/intervall". Misspelling the word as
>> "Produktionintervall" fixes it:
> It should be affixes marked as 'affix in middle of compound word',
> Flag is '~', example look in norsk dictionary:
>
> flag ~\\:
> [^S] > S #~ advarsel > advarsels-
>
> BTW, we develop and debug compound word support on norsk
> (norwegian) dictionary, so look for example there. But we don't
> know Norwegian, norwegians helped us :)

Hello everyone!

I cannot get this to work. Neither in a german version, nor with the
norwegian example supplied on the tsearch website.
That means, just like Hannes I can get compound word support without
inserted 's' in german and norwegian:
"Vertragstrafe" works, but not "Vertragsstrafe", which is the right
Form.

So I tried it the other way around: My dictionary consists of two words:

---
vertrag/zs
strafe/z
---

My affixes file just switches on compounds and allows for s-insertion
as described in the norwegian tutorial:

---
compoundwords controlled z
suffixes
flag s:
[^S] > S # endet nicht auf "s": "s" anfuegen und in
compound-check ("Recht" > "Rechts-")
---

ts_debug yields:

tstest=# SELECT tsearch2.ts_debug('vertragstrafe strafevertrag
vertragsstrafe');
ts_debug
------------------------------------------------------------------------
-------------
(german,lword,"Latin
word",vertragstrafe,"{ispell_de,simple}","'strafe' 'vertrag'")
(german,lword,"Latin
word",strafevertrag,"{ispell_de,simple}","'strafe' 'vertrag'")
(german,lword,"Latin
word",vertragsstrafe,"{ispell_de,simple}",'vertragsstrafe')
(3 Zeilen)

I would say, the ispell compound support does not honor the s-Flag in
compounds.
Could it be, that this feature got lost in a regression? It must have
worked for norwegian once. (Take the "overtrekksgrilldresser" example
from the tsearch2:compounds tutorial, that I cannot reproduce).

Any hints?

Alexander

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John D. Burger 2006-01-27 14:14:09 Re: Finding missing records
Previous Message Richard Huxton 2006-01-27 14:09:45 Re: PG_RESTORE and database size