Quick Links

Re: case insensitive match in unicode

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	SunWuKung <Balazs(dot)Klein(at)axelero(dot)hu>
Cc:	pgsql-general(at)postgresql(dot)org
Subject:	Re: case insensitive match in unicode
Date:	2006-03-27 09:48:29
Message-ID:	20060327094829.GA30791@svana.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Mon, Mar 27, 2006 at 11:31:17AM +0200, SunWuKung wrote:
> I would need to do case insensitive match against a field that contains
> text in different languages - Greek, Hungarian, Arabic etc.
> The db encoding is UTF8.
>
> So far I found no way to achieve that. I tried converting both strings
> to the same case and using ~* , but neither worked.

Oh, tricky. Firstly, case-insensetive means different things to
different locales. For example, in Turkish 'i' is not the lowecase
version of 'I'. Maybe you've chosen a locale that doesn't do UTF-8? You
don't specify a platform either. Locale support varies wildly by
platform.

What you probably want it some kind of accent-insensetive match that
mean that é, è, ë, e, É, È, E and Ë are all considered to match
eachother. The way you do that is by converting unicode to a particular
normal form and then comparing. Unfortunatly, I don't think PostgreSQL
supplies such a function right now.

However, some server-side procedural languages can do this. If you can
find one (possibly Perl) that can do the conversion, you can create a
function to do the mapping.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

case insensitive match in unicode at 2006-03-27 09:31:17 from SunWuKung

Responses

Re: case insensitive match in unicode at 2006-03-27 10:45:05 from SunWuKung

Browse pgsql-general by date

	From	Date	Subject
Next Message	JP Glutting	2006-03-27 10:43:24	Error backing up database (Unicode)
Previous Message	JP Glutting	2006-03-27 09:46:18	Bug? was: Re: ERROR: could not convert UTF8 character to ISO8859-1