Re: What is the simpliest text search configuration?

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Jérôme Etévé <jerome(dot)eteve(at)gmail(dot)com>
Cc: Michael Nacos <m(dot)nacos(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: What is the simpliest text search configuration?
Date: 2009-11-12 16:24:03
Message-ID: Pine.LNX.4.64.0911121923190.6801@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We submitted unaccent dictionary for 8.5
See http://www.sai.msu.su/~megera/wiki/unaccent for some information

Oleg
On Thu, 12 Nov 2009, Jrme Etv wrote:

> Hi Michael,
>
> I actually found that the 'simple' dictionary doesn't enforce a
> stopword list by default. so i defined my search conf like this and it
> works:
>
> create text search configuration sbsimple ( parser = 'default' ) ;
> alter text search configuration sbsimple ALTER MAPPING FOR
> word,hword,asciiword,asciihword WITH simple
>
> Cheers!
>
> J.
>
> 2009/11/12 Michael Nacos <m(dot)nacos(at)gmail(dot)com>:
>> Dear Jerome,
>>
>> from personal experience full-text searching in PostgreSQL can be quite
>> powerful
>> but it's not simple, it requires thought, planning and coding. PostgreSQL
>> mainly
>> provides an efficient token matching mechanism supporting positional
>> information
>> and weights, but natural language processing and normalization is pretty
>> basic.
>>
>> If you don't mind writing a couple of user-defined functions to take control
>> of lexeme
>> normalization, then tsvector/tsquery support can be a very powerful tool for
>> custom
>> search engines.
>>
>> regards,
>>
>> Michael
>>
>> 2009/11/12 JЪЪrЪЪme EtЪЪvЪЪ <jerome(dot)eteve(at)gmail(dot)com>
>>>
>>> Hi all,
>>>
>>> I'd like to implement a full text search with postgresql, and I can't
>>> find
>>> a text search configuration that would just:
>>>
>>> map unicode accentuated letters to an un-accentuated equivalent
>>> tokenize the words (and skip any non word characters)
>>> no stopwords
>>> lower case the tokens
>>>
>>> How can I achieve this? I'm particularly interested in deactivating
>>> the stopwords filtering.
>>>
>>> I tried pg_catalog.simple, but despite its name, it still considers stop
>>> words.
>>>
>>> Thanks for your help!
>>>
>>> Jerome.
>>>
>>
>>
>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Andreas Kretschmer 2009-11-12 16:38:23 Re: re-using RETURNING
Previous Message Tom Lane 2009-11-12 15:20:33 Re: What is the simpliest text search configuration?