Case-Insensitve Text Comparison

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Case-Insensitve Text Comparison
Date: 2008-06-02 03:53:25
Message-ID: D45A8C1C-1387-4B97-B329-67C80A3F1095@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Howdy,

I'm sure I'm just showing off my ignorance here, but here goes…

I really need case-insensitive string comparison in my database.
Ideally there'd be a nice ITEXT data type (and friends, ichar,
ivarchar, etc.). But of course there isn't, and for years I've just
used LOWER() on indexes and queries to get the same result.

Only it turns out that I'm of course not getting the same result. This
script:

#!/usr/local/bin/perl -w

use strict;
use warnings;
use utf8;
binmode STDOUT, ':utf8';
use DBI;

my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '',
{ pg_enable_utf8 => 1 });
for my $char qw( À Á Â Ã Ä Å Ç Ć Č Ĉ Ċ Ď Đ A B C D ) {
print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef,
$char ), $/;
}

Yields this output:

À: À
Á: Á
Â: Â
Ã: Ã
Ä: Ä
Å: Å
Ç: Ç
Ć: Ć
Č: Č
Ĉ: Ĉ
Ċ: Ċ
Ď: Ď
Đ: Đ
A: a
B: b
C: c
D: d

So it doesn't really work on anything other than ASCII, it looks like.
So I have two questions:

1. Does the use of the tolower() C function in the citext data type on
pgfoundry basically give me the same results as using lower() in my
SQL has for all these years? IOW, does it convert letters to lowercase
in the same way that the LOWER() SQL function does? If so, I think I
might start to use it for my case-insensitive columns and simplify my
SQL a bit.

http://pgfoundry.org/projects/citext/

2. Isn't the ICU library distributed with PostgreSQL? And if so, could
it not be used to create proper case conversions in LOWER() and
friends and, ultimately, to create a case-insensitive text type in
core? I'm seeing that it has a constant named U_COMPARE_IGNORE_CASE
that can be used with its unorm_compare() function:

http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5
http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437

I don't really know C, but if that's stuff there, can't we take
advantage of it for proper case-insensitive comparisons (and
conversions)?

Thanks,

David

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-06-02 04:08:35 Re: Case-Insensitve Text Comparison
Previous Message Joe Conway 2008-06-02 00:14:13 Re: Add dblink function to check if a named connection exists