Re: Fulltext search configuration

From: Mohamed <mohamed5432154321(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext search configuration
Date: 2009-02-02 14:40:56
Message-ID: 861fed220902020640s27279ad3o9cd9c4c26ed0066b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Oleg, like I mentioned earlier. I have a different .affix file that I got
from Andrew with the stop file and I get no errors creating the dictionary
using that one but I get nothing out from ts_lexize.
The size on that one is : 406,219 bytes
And the size on the hunspell one (first) : 406,229 bytes

Little to close, don't you think ?

It might be that the arabic hunspell (ayaspell) affix file is damaged on
some lines and I got the fixed one from Andrew.

Just wanted to let you know.

/ Moe

On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321(at)gmail(dot)com> wrote:

> Ok, thank you Oleg.
> I have another dictionary package which is a conversion to hunspell
> aswell:
>
>
> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08
>
> And running that gives me this error : (again the affix file)
>
> ERROR: wrong affix file format for flag
> CONTEXT: line 560 of configuration file "C:/Program
> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013
> Y 6
> "
>
> / Moe
>
>
>
> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>
>> Mohamed,
>>
>> We are looking on the problem.
>>
>> Oleg
>>
>> On Mon, 2 Feb 2009, Mohamed wrote:
>>
>> No, I don't. But the ts_lexize don't return anything so I figured there
>>> must
>>> be an error somehow.
>>> I think we are using the same dictionary + that I am using the stopwords
>>> file and a different affix file, because using the hunspell (ayaspell)
>>> .aff
>>> gives me this error :
>>>
>>> ERROR: wrong affix file format for flag
>>> CONTEXT: line 42 of configuration file "C:/Program
>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40
>>>
>>> / Moe
>>>
>>>
>>>
>>>
>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
>>> daniel(dot)chiaramello(at)golog(dot)net> wrote:
>>>
>>> Hi Mohamed.
>>>>
>>>> I don't know where you get the dictionary - I unsuccessfully tried the
>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic
>>>> stopwords
>>>> file.
>>>>
>>>> Renaming the file is supposed to be enough (I did it successfully for
>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
>>>> When I tried to create the dictionary:
>>>>
>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell (
>>>> TEMPLATE = ispell,
>>>> DictFile = ar_utf8,
>>>> AffFile = ar_utf8,
>>>> StopWords = english
>>>> );
>>>>
>>>> I had an error:
>>>>
>>>> ERREUR: mauvais format de fichier affixe pour le drapeau
>>>> CONTEXTE : ligne 42 du fichier de configuration ?
>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40
>>>>
>>>> (which means Bad format of Affix file for flag, line 42 of configuration
>>>> file)
>>>>
>>>> Do you have an error when creating your dictionary?
>>>>
>>>> Daniel
>>>>
>>>> Mohamed a ?crit :
>>>>
>>>>
>>>> I have ran into some problems here.
>>>> I am trying to implement arabic fulltext search on three columns.
>>>>
>>>> To create a dictionary I have a hunspell dictionary and and arabic stop
>>>> file.
>>>>
>>>> CREATE TEXT SEARCH DICTIONARY hunspell_dic (
>>>> TEMPLATE = ispell,
>>>> DictFile = hunarabic,
>>>> AffFile = hunarabic,
>>>> StopWords = arabic
>>>> );
>>>>
>>>>
>>>> 1) The problem is that the hunspell contains a .dic and a .aff file but
>>>> the configuration requeries a .dict and .affix file. I have tried to
>>>> change
>>>> the endings but with no success.
>>>>
>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing
>>>>
>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ?
>>>>
>>>> 4) I have read that when using dictionaries, if a word is not recognized
>>>> by
>>>> any dictionary it will not be indexed. I find that troublesome. I would
>>>> like
>>>> everything but the stop words to be indexed. I guess this might be a
>>>> step
>>>> that I am not ready for yet, but just wanted to put it out there.
>>>>
>>>>
>>>>
>>>> Also I would like to know how the process of the fulltext search
>>>> implementation looks like, from config to search.
>>>>
>>>> Create dictionary, then a text configuration, add dic to configuration,
>>>> index columns with gin or gist ...
>>>>
>>>> How does a search look like? Does it match against the gin/gist index.
>>>> Have that index been built up using the dictionary/configuration, or is
>>>> the
>>>> dictionary only used on search frases?
>>>>
>>>> / Moe
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Oleg Bartunov 2009-02-02 14:50:00 Re: Fulltext search configuration
Previous Message Eric Brown 2009-02-02 14:40:16 Is dropping pg_ts_* harmful?