Re: how to ignore accents?

From: Michael Fuhr <mike(at)fuhr(dot)org>
To: nasr(dot)laili(at)tin(dot)it, pgsql-novice(at)postgresql(dot)org
Subject: Re: how to ignore accents?
Date: 2005-03-31 00:42:36
Message-ID: 20050331004236.GA37961@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Thu, Mar 31, 2005 at 02:01:45AM +0200, Ennio-Sr wrote:
> * Celia McInnis <celia(at)drmath(dot)ca> [300305, 12:29]:
> > I have French text with accents in it and would rather (for search purposes)
> > be able to search for things with the accents removed. Is there any builtin
> > postgres function which will do this?
> >
> > Here is the format of some accented (French) words in my database:
> >
> > fran\347ais caf\351 t\364\t br\373le r\352ver \342nes b\350gues
> >
> > which I want to be able to search for as:
> >
> > francais cafe tot brule rever anes begues
>
> I'm not an expert, but I'm afraid it's not possible to do that (and I'd
> be glad if somebody came out saying I'm wrong as I've the same problem).

You could write an unaccent() function that calls translate().

http://www.postgresql.org/docs/8.0/interactive/functions-string.html

Here's an (incomplete) example that works in PostgreSQL 8.0.2beta1:

CREATE FUNCTION unaccent(text) RETURNS text AS $$
BEGIN
RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou');
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

SELECT unaccent('fran\347ais');
unaccent
----------
francais
(1 row)

You could use unaccent() in a functional index:

CREATE TABLE words (
id serial PRIMARY KEY,
word text NOT NULL
);

CREATE INDEX words_word_idx ON words (unaccent(word));

INSERT INTO words (word) VALUES ('fran\347ais');
INSERT INTO words (word) VALUES ('caf\351');
INSERT INTO words (word) VALUES ('t\364\t');

SELECT * FROM words WHERE unaccent(word) = 'francais';
id | word
----+----------
1 | français
(1 row)

EXPLAIN shows that the index is being used:

EXPLAIN SELECT * FROM words WHERE unaccent(word) = 'francais';
QUERY PLAN
------------------------------------------------------------------------------
Index Scan using words_word_idx on words (cost=0.00..11.31 rows=6 width=36)
Index Cond: (unaccent(word) = 'francais'::text)
(2 rows)

Hope this helps.

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Noel Faux 2005-03-31 01:01:04 Test
Previous Message Ennio-Sr 2005-03-31 00:01:45 Re: how to ignore accents?