Quick Links

Re: Accent insensitive search

From:	PFC <lists(at)peufeu(dot)com>
To:	Diego Manilla Suárez <diego(dot)manilla(at)xeridia(dot)com>, pgsql-general(at)postgresql(dot)org
Subject:	Re: Accent insensitive search
Date:	2007-06-21 09:56:44
Message-ID:	op.tt9m8upxcigqcu@apollo13
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

> Hi. I have a few databases created with UNICODE encoding, and I would
> like to be able to search with accent insensitivity. There's something
> in Oracle (NLS_COMP, NLS_SORT) and SQL Server (don't remember) to do
> this, but I found nothing in PostgreSQL, just the 'to_ascii' function,
> which AFAIK, doesn't work with UNICODE.

The easiest way is to create an extra column which will hold a copy of
your text, with all accents removed. You can also convert it to lowercase
and remove apostrophes, punctuation etc. Said column is kept up to date
with a trigger.
Python is suitable for this (use unicodedata.normalize).
Keeping a copy of the processed data will speed up search versus WHERE
remove_accents( blah ) = 'text', even with a function index.
Note that this function could be written in C and use a table on the
first 64K unicode symbols for speedup.

See attached file.

Attachment	Content-Type	Size
create_ft_functions.sql	application/octet-stream	5.5 KB

In response to

Accent insensitive search at 2007-06-21 09:02:26 from Diego Manilla Suárez

Responses

Re: Accent insensitive search at 2007-06-21 11:05:10 from Gregory Stark
Re: Accent insensitive search at 2007-06-21 13:18:50 from Albe Laurenz

Browse pgsql-general by date

	From	Date	Subject
Next Message	Bruce McAlister	2007-06-21 10:16:51	Re: Recovery/Restore and Roll Forward Question.
Previous Message	Vincenzo Romano	2007-06-21 09:56:05	[PGSQL 8.2.x] INSERT+INSERT