Re: Fulltext search configuration

From: Mohamed <mohamed5432154321(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Fulltext search configuration
Date: 2009-02-02 15:01:52
Message-ID: 861fed220902020701g7f2136e3w9e83a25f7517da1b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hehe, ok..
I don't know either but I took some lines from Al-Jazeera :
http://aljazeera.net/portal

just made the change you said and created it successfully and tried this :

select ts_lexize('ayaspell', 'استشهد فلسطيني وأصيب ثلاثة في غارة إسرائيلية
جديدة')

but I got nothing... :(

Is there a way of making sure that words not recognized also gets
indexed/searched for ? (Not that I think this is the problem)

/ Moe

On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:

> Mohamed,
>
> comment line in ar.affix
> #FLAG long
> and creation of ispell dictionary will work. This is temp, solution. Teodor
> is working on fixing affix autorecognizing.
>
> I can't say anything about testing, since somebody should provide
> first test case. I don't know how to type arabic :)
>
>
> Oleg
>
> On Mon, 2 Feb 2009, Mohamed wrote:
>
> Oleg, like I mentioned earlier. I have a different .affix file that I got
>> from Andrew with the stop file and I get no errors creating the dictionary
>> using that one but I get nothing out from ts_lexize.
>> The size on that one is : 406,219 bytes
>> And the size on the hunspell one (first) : 406,229 bytes
>>
>> Little to close, don't you think ?
>>
>> It might be that the arabic hunspell (ayaspell) affix file is damaged on
>> some lines and I got the fixed one from Andrew.
>>
>> Just wanted to let you know.
>>
>> / Moe
>>
>>
>>
>> On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321(at)gmail(dot)com>
>> wrote:
>>
>> Ok, thank you Oleg.
>>> I have another dictionary package which is a conversion to hunspell
>>> aswell:
>>>
>>>
>>>
>>> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29
>>> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08
>>>
>>> And running that gives me this error : (again the affix file)
>>>
>>> ERROR: wrong affix file format for flag
>>> CONTEXT: line 560 of configuration file "C:/Program
>>> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX
>>> 1013
>>> Y 6
>>> "
>>>
>>> / Moe
>>>
>>>
>>>
>>> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> wrote:
>>>
>>> Mohamed,
>>>>
>>>> We are looking on the problem.
>>>>
>>>> Oleg
>>>>
>>>> On Mon, 2 Feb 2009, Mohamed wrote:
>>>>
>>>> No, I don't. But the ts_lexize don't return anything so I figured there
>>>>
>>>>> must
>>>>> be an error somehow.
>>>>> I think we are using the same dictionary + that I am using the
>>>>> stopwords
>>>>> file and a different affix file, because using the hunspell (ayaspell)
>>>>> .aff
>>>>> gives me this error :
>>>>>
>>>>> ERROR: wrong affix file format for flag
>>>>> CONTEXT: line 42 of configuration file "C:/Program
>>>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40
>>>>>
>>>>> / Moe
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello <
>>>>> daniel(dot)chiaramello(at)golog(dot)net> wrote:
>>>>>
>>>>> Hi Mohamed.
>>>>>
>>>>>>
>>>>>> I don't know where you get the dictionary - I unsuccessfully tried the
>>>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic
>>>>>> stopwords
>>>>>> file.
>>>>>>
>>>>>> Renaming the file is supposed to be enough (I did it successfully for
>>>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one.
>>>>>> When I tried to create the dictionary:
>>>>>>
>>>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell (
>>>>>> TEMPLATE = ispell,
>>>>>> DictFile = ar_utf8,
>>>>>> AffFile = ar_utf8,
>>>>>> StopWords = english
>>>>>> );
>>>>>>
>>>>>> I had an error:
>>>>>>
>>>>>> ERREUR: mauvais format de fichier affixe pour le drapeau
>>>>>> CONTEXTE : ligne 42 du fichier de configuration ?
>>>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y
>>>>>> 40
>>>>>>
>>>>>> (which means Bad format of Affix file for flag, line 42 of
>>>>>> configuration
>>>>>> file)
>>>>>>
>>>>>> Do you have an error when creating your dictionary?
>>>>>>
>>>>>> Daniel
>>>>>>
>>>>>> Mohamed a ?crit :
>>>>>>
>>>>>>
>>>>>> I have ran into some problems here.
>>>>>> I am trying to implement arabic fulltext search on three columns.
>>>>>>
>>>>>> To create a dictionary I have a hunspell dictionary and and arabic
>>>>>> stop
>>>>>> file.
>>>>>>
>>>>>> CREATE TEXT SEARCH DICTIONARY hunspell_dic (
>>>>>> TEMPLATE = ispell,
>>>>>> DictFile = hunarabic,
>>>>>> AffFile = hunarabic,
>>>>>> StopWords = arabic
>>>>>> );
>>>>>>
>>>>>>
>>>>>> 1) The problem is that the hunspell contains a .dic and a .aff file
>>>>>> but
>>>>>> the configuration requeries a .dict and .affix file. I have tried to
>>>>>> change
>>>>>> the endings but with no success.
>>>>>>
>>>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing
>>>>>>
>>>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ?
>>>>>>
>>>>>> 4) I have read that when using dictionaries, if a word is not
>>>>>> recognized
>>>>>> by
>>>>>> any dictionary it will not be indexed. I find that troublesome. I
>>>>>> would
>>>>>> like
>>>>>> everything but the stop words to be indexed. I guess this might be a
>>>>>> step
>>>>>> that I am not ready for yet, but just wanted to put it out there.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Also I would like to know how the process of the fulltext search
>>>>>> implementation looks like, from config to search.
>>>>>>
>>>>>> Create dictionary, then a text configuration, add dic to
>>>>>> configuration,
>>>>>> index columns with gin or gist ...
>>>>>>
>>>>>> How does a search look like? Does it match against the gin/gist
>>>>>> index.
>>>>>> Have that index been built up using the dictionary/configuration, or
>>>>>> is
>>>>>> the
>>>>>> dictionary only used on search frases?
>>>>>>
>>>>>> / Moe
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> Regards,
>>>> Oleg
>>>> _____________________________________________________________
>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>
>>>>
>>>
>>>
>>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Scot Kreienkamp 2009-02-02 15:19:03 Re: Warm Standby question
Previous Message Oleg Bartunov 2009-02-02 14:50:00 Re: Fulltext search configuration