Re: Domains and supporting functions

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Domains and supporting functions
Date: 2006-02-19 22:06:38
Message-ID: 20060219220638.GI1323@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 19, 2006 at 04:35:56PM -0500, Andrew Dunstan wrote:
> Have you looked at the code of citext? Unless I'm misreading, it creates
> a lowercase copy of each string for each comparison. And it doesn't look
> to me like it's encoding/locale aware.

Its cilower function isn't terribly great and could probably do with
some work. toupper/tolower() are encoding/locale sensetive, but the
code used doesn't really handle multibyte encodings. But it's an
excellent starting point for creating new types because almost all the
hard work is done.

> I'm not sure how hard a text type with efficient, encoding and locale
> aware, case-insensitive comparison would be to create , but it would be
> a Good Thing (tm) to have available.

Hmm, "case-insensetive match" is a terribly badly defined concept.
There's a reason why there's a strcasecmp() but no strcasecoll(). The
code currently uses tolower, but if you changed it to do toupper it
would be equally valid yet produce different results.

If/when we ever get to use a real internationalisation library like
ICU, we can do things like convert strings to Normal Form D so we can
compare character seperate from their accents, ie accent-insensetive
comparison. In any case ICU contains mappings for things like
title-case and all the different kinds of space and hyphens so people
can specify their own mapping to get whatever they're happy with.

Until then, people will just have to rely on their system's support for
tolower().

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-02-19 22:14:32 Re: Fix to CVE-2006-0553 for 8.1.1
Previous Message Albert Chin 2006-02-19 21:38:07 Fix to CVE-2006-0553 for 8.1.1