Skip site navigation (1) Skip section navigation (2)

Re: how to ignore accents?

From: "Celia McInnis" <celia(at)drmath(dot)ca>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: how to ignore accents?
Date: 2005-03-31 13:40:04
Message-ID: 20050331133155.M49612@drmath.ca (view raw or flat)
Thread:
Lists: pgsql-novice
Thanks, Michael - I just tried your suggestion in postgresql 8.0.1 and 
unfortunately it doesn't work. How long will it be before 8.0.2 moves out of 
beta mode?

Here's the error which I get in 8.0.1:

ERROR: invalid byte sequence for encoding "UNICODE": 0xe76169

Is there something which I can do in 8.0.1 to have the unaccenting work?

Celia McInnis

On Wed, 30 Mar 2005 17:42:36 -0700, Michael Fuhr wrote
> You could write an unaccent() function that calls translate().
> 
>   http://www.postgresql.org/docs/8.0/interactive/functions-string.html
> 
> Here's an (incomplete) example that works in PostgreSQL 8.0.2beta1:
> 
>   CREATE FUNCTION unaccent(text) RETURNS text AS $$
>   BEGIN
>       RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou')
> ;  END;  $$ LANGUAGE plpgsql IMMUTABLE STRICT;
> 
>   SELECT unaccent('fran\347ais');
>    unaccent 
>   ----------
>    francais
>   (1 row)
> 
> You could use unaccent() in a functional index:
> 
>   CREATE TABLE words (
>       id    serial PRIMARY KEY,
>       word  text NOT NULL
>   );
> 
>   CREATE INDEX words_word_idx ON words (unaccent(word));
> 
>   INSERT INTO words (word) VALUES ('fran\347ais');
>   INSERT INTO words (word) VALUES ('caf\351');
>   INSERT INTO words (word) VALUES ('t\364\t');
> 
>   SELECT * FROM words WHERE unaccent(word) = 'francais';
>    id |   word   
>   ----+----------
>     1 | fran├žais
>   (1 row)
> 
> EXPLAIN shows that the index is being used:
> 
>   EXPLAIN SELECT * FROM words WHERE unaccent(word) = 'francais';
>                                     QUERY PLAN
> 
>   --------------------------------------------------------------------------
----
>    Index Scan using words_word_idx on words  (cost=0.00..11.31 
> rows=6 width=36)     Index Cond: (unaccent(word) = 'francais'::text)
>   (2 rows)
> 
> Hope this helps.
> 
> -- 
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
> 



> On Thu, Mar 31, 2005 at 02:01:45AM +0200, Ennio-Sr wrote:
> > * Celia McInnis <celia(at)drmath(dot)ca> [300305, 12:29]:
> > > I have French text with accents in it and would rather (for search 
purposes) 
> > > be able to search for things with the accents removed. Is there any 
builtin 
> > > postgres function which will do this?
> > > 
> > > Here is the format of some accented (French) words in my database:
> > > 
> > > fran\347ais caf\351 t\364\t br\373le r\352ver \342nes b\350gues
> > > 
> > > which I want to be able to search for as:
> > > 
> > > francais cafe tot brule rever anes begues
> > 

> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


--
Open WebMail Project (http://openwebmail.org)


In response to

Responses

pgsql-novice by date

Next:From: Michael FuhrDate: 2005-03-31 16:29:19
Subject: Re: how to ignore accents?
Previous:From: Kim KohenDate: 2005-03-31 11:59:01
Subject: Boolean search

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group