Re: Seaching with and without diacritical marks

From: Lynna Landstreet <lynna(at)gallery44(dot)org>
To: Ullrich Ralf <Ullrich(at)iai(dot)spk-berlin(dot)de>, "Pgsql-Novice (E-Mail)" <pgsql-novice(at)postgresql(dot)org>
Subject: Re: Seaching with and without diacritical marks
Date: 2004-07-14 19:36:08
Message-ID: BD1B0368.15DC%lynna@gallery44.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

on 7/13/04 8:20 AM, Ullrich Ralf at Ullrich(at)iai(dot)spk-berlin(dot)de wrote:

> I have a multilingual portal running on PostgreSQL 7.4.2.
> My clients come from spain,portugal, latin america and germany (mainly).
> The main feature of the site is a search engine that retrieves bibliographic
> data, which is stored in
> my database (unicode!) with diacritical marks (e.g. Panamá,América); when
> users enter their search terms with diacritical marks postgres will find the
> requested records, but if a german user enter Panama or America (without
> diacritical marks), the search fails.
> Is there any extension for postgres that allows for both modes of searching on
> the same data?

I've been wrestling with this issue too - I'm working on an art gallery
database which includes work from a number of French-Canadian artists, plus
a few from other countries where names and image titles typically involve
accents as well.

What I've been tentatively planning to do is to handle it in the PHP
frontend rather than the database itself, by setting up a function with
strtr() (string translate) that would strip out the accents while searching
so that results would come up regardless of whether users entered the right
accent, the wrong accent or no accent at all. The strtr() function allows
you to specify a number of pairs of strings to translate, so I could make up
a list of all the commonly used accented characters and have it translate
all search text with those. I'd apply it to both the search terms entered
and the text found, so that any "a" would match any other "a", regardless of
whether it was really an à, á, ä, â or just plain a (let's see if those
accents show up in anyone's e-mail...). It's kind of the way I'm handling
case sensitivity now.

Lynna

--
Resource Centre Database Coordinator
Gallery 44: www.gallery44.org
Database Project: www.gallery44db.org

In response to

Browse pgsql-novice by date

  From Date Subject
Next Message Frank Voellmann 2004-07-14 19:39:54 Trouble with pg_dump in 7.3.4
Previous Message Marcos Medina 2004-07-14 15:13:40 FOR-IN-EXECUTE, why fail?