Thank you for you detailed answer. I have learned alot more about this stuff
As I see it accordingly to the results it's between Hunspell and Aspell. My
Aspell version is 0.6 released 2006. The Hunspell was released in 2008.
When I run the Postgres command \dFt I get the following list :
So I set up my dictionary with the ispell as a template and Hunspell/Aspell
files. Now I just have one decision to make :)
Just another thing:
> If you want to support multiple language dictionaries for a single table,
> with each row associated to its own dictionary
Not really, since the two languages don't overlap, couldn't I set up two
separate dictionaries and index against both on the whole table ? I think
that's what Oleg was refering to. Not sure...
Thanks for all the help / Moe
Ps. I can't read Arabic so I can't have a look on the files to decide :O
On Fri, Jan 9, 2009 at 2:14 PM, Andrew <archa(at)pacific(dot)net(dot)au> wrote:
> Hi Mohammed,
> See my answers below, and hopefully they won't lead you too far astray.
> Note though, it has been a long time since I have done this and there are
> doubtless more knowledgeable people in this forum who will be able to
> correct anything I say that may be misleading or incorrect.
> Mohamed wrote:
> no one ?
> / Moe
> On Thu, Jan 8, 2009 at 11:46 AM, Mohamed <mohamed5432154321(at)gmail(dot)com>wrote:
>> Ok, thank you all for your help. It has been very valuable. I am starting
>> to get the hang of it and almost read the whole chapter 12 + extras but I
>> still need a little bit of guidance.
>> I have now these files :
>> - A arabic Hunspell rar file (OpenOffice version) wich includes :
>> - ar.dic
>> - ar.aff
>> - An Aspell rar file that includes alot of files
>> - A Myspell ( says simple words list )
>> - And also Andrews two files :
>> - ar.affix
>> - ar.stop
>> I am thinking that I should go with just one of these right and that
>> should be the Hunspell?
> Hunspell is based on MySpell, extending it with support for complex
> compound words and unicode characters, however Postgresql cannot take
> advantage of Hunspell's compound word capabilities at present. Aspell is a
> GNU dictionary that replaces Ispell and supports UTF-8 characters. See
> http://aspell.net/test/ for comparisons between dictionaries, though be
> aware this test is hosted by Aspell... I will leave it to others to argue
> the merits of Hunspell vs. Aspell, and why you would choose one or the
> There is an ar.aff file there and Andrews file ends with .affix, are
>> those perhaps similiar? Should I skip Andrews ?
> The ar.aff file that comes with OpenOffice Hunspell dictionary is
> essentially the same as the ar.affix I supplied. Just open the two up,
> compare them and choose the one that you feel is best. A Hunspell
> dictionary will work better with a corresponding affix file.
> Use just the ar.stop file ?
> The ar.stop file flags common words from being indexed. You will want a
> stop file as well as the dictionary and affix file. Feel free to modify the
> stop file to meet your own needs.
>> On the Arabic / English on row basis language search approach, I will
>> skip and choose the approach suggested by Oleg :
>> if arabic and english characters are not overlaped, you can use one
>> The Arabic letters and English letters or words don't overlap so that
>> should not be an issue? Will I be able to index and search against both
>> languages in the same query?
> If you want to support multiple language dictionaries for a single
> table, with each row associated to its own dictionary, use the
> tsvector_update_trigger_column trigger to automatically update your tsvector
> indexed column on insert or update. To support this, your table will need
> an additional column of type regconfig that contains the name of the
> dictionary to use when searching on the tsvector column for that particular
> row. See
> http://www.postgresql.org/docs/current/static/textsearch-features.html#TEXTSEARCH-UPDATE-TRIGGERSfor more details. This will allow you to search across both languages in
> the one query as you were asking.
>> And also
>> 1. What language files should I use ?
>> 2. How does my create dictionary for the arabic language look like ?
>> Perhaps like this :
>> CREATE TEXT SEARCH DICTIONARY arabic_dic(
>> TEMPLATE = ? , // Not sure what this means
>> DictFile = ar, // referring to ar.dic (hunspell)
>> AffFile = ar , // referring to ar.aff (hunspell)
>> StopWords = ar // referring to Andrews stop file. ( what about Andrews .affix file ? )
>> // Anything more ?
> From psql command line you can find out what templates you have using the
> following command:
> or looking at the contents of the pg_ts_template table.
> If choosing a Hunspell or Aspell dictionary, I believe a value of TEMPLATE
> = ispell should be okay for you - see
> The template provides instructions to postgresql on how to interact with the
> dictionary. The rest of the create dictionary statement appears fine to me.
> Thanks again! / Moe
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com
> Version: 8.0.176 / Virus Database: 270.10.3/1879 - Release Date: 1/6/2009 5:16 PM
In response to
pgsql-general by date
|Next:||From: Kevin Grittner||Date: 2009-01-09 15:58:21|
|Subject: Re: Improving compressibility of WAL files|
|Previous:||From: Mohamed||Date: 2009-01-09 15:30:56|
|Subject: Re: Adding Arabic dictionary for TSearch2.. to_tsvector('arabic'...) doesn't work..|