Re: how to ignore accents?

From: "Celia McInnis" <celia(at)drmath(dot)ca>
To: pgsql-novice(at)postgresql(dot)org
Subject: Re: how to ignore accents?
Date: 2005-03-31 13:40:04
Message-ID: 20050331133155.M49612@drmath.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Thanks, Michael - I just tried your suggestion in postgresql 8.0.1 and
unfortunately it doesn't work. How long will it be before 8.0.2 moves out of
beta mode?

Here's the error which I get in 8.0.1:

ERROR: invalid byte sequence for encoding "UNICODE": 0xe76169

Is there something which I can do in 8.0.1 to have the unaccenting work?

Celia McInnis

On Wed, 30 Mar 2005 17:42:36 -0700, Michael Fuhr wrote
> You could write an unaccent() function that calls translate().
>
> http://www.postgresql.org/docs/8.0/interactive/functions-string.html
>
> Here's an (incomplete) example that works in PostgreSQL 8.0.2beta1:
>
> CREATE FUNCTION unaccent(text) RETURNS text AS $$
> BEGIN
> RETURN translate($1, '\342\347\350\351\352\364\373', 'aceeeou')
> ; END; $$ LANGUAGE plpgsql IMMUTABLE STRICT;
>
> SELECT unaccent('fran\347ais');
> unaccent
> ----------
> francais
> (1 row)
>
> You could use unaccent() in a functional index:
>
> CREATE TABLE words (
> id serial PRIMARY KEY,
> word text NOT NULL
> );
>
> CREATE INDEX words_word_idx ON words (unaccent(word));
>
> INSERT INTO words (word) VALUES ('fran\347ais');
> INSERT INTO words (word) VALUES ('caf\351');
> INSERT INTO words (word) VALUES ('t\364\t');
>
> SELECT * FROM words WHERE unaccent(word) = 'francais';
> id | word
> ----+----------
> 1 | français
> (1 row)
>
> EXPLAIN shows that the index is being used:
>
> EXPLAIN SELECT * FROM words WHERE unaccent(word) = 'francais';
> QUERY PLAN
>
> --------------------------------------------------------------------------
----
> Index Scan using words_word_idx on words (cost=0.00..11.31
> rows=6 width=36) Index Cond: (unaccent(word) = 'francais'::text)
> (2 rows)
>
> Hope this helps.
>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>

> On Thu, Mar 31, 2005 at 02:01:45AM +0200, Ennio-Sr wrote:
> > * Celia McInnis <celia(at)drmath(dot)ca> [300305, 12:29]:
> > > I have French text with accents in it and would rather (for search
purposes)
> > > be able to search for things with the accents removed. Is there any
builtin
> > > postgres function which will do this?
> > >
> > > Here is the format of some accented (French) words in my database:
> > >
> > > fran\347ais caf\351 t\364\t br\373le r\352ver \342nes b\350gues
> > >
> > > which I want to be able to search for as:
> > >
> > > francais cafe tot brule rever anes begues
> >

> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

--
Open WebMail Project (http://openwebmail.org)

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Michael Fuhr 2005-03-31 16:29:19 Re: how to ignore accents?
Previous Message Kim Kohen 2005-03-31 11:59:01 Boolean search