Skip site navigation (1) Skip section navigation (2)

Case-Insensitve Text Comparison

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Case-Insensitve Text Comparison
Date: 2008-06-02 03:53:25
Message-ID: D45A8C1C-1387-4B97-B329-67C80A3F1095@kineticode.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Howdy,

I'm sure I'm just showing off my ignorance here, but here goes…

I really need case-insensitive string comparison in my database.  
Ideally there'd be a nice ITEXT data type (and friends, ichar,  
ivarchar, etc.). But of course there isn't, and for years I've just  
used LOWER() on indexes and queries to get the same result.

Only it turns out that I'm of course not getting the same result. This  
script:

#!/usr/local/bin/perl -w

use strict;
use warnings;
use utf8;
binmode STDOUT, ':utf8';
use DBI;

my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '',  
{ pg_enable_utf8 => 1 });
for my $char qw( À Á Â Ã Ä Å Ç Ć Č Ĉ Ċ Ď Đ A B C D ) {
     print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef,  
$char ), $/;
}

Yields this output:

À: À
Á: Á
Â: Â
Ã: Ã
Ä: Ä
Å: Å
Ç: Ç
Ć: Ć
Č: Č
Ĉ: Ĉ
Ċ: Ċ
Ď: Ď
Đ: Đ
A: a
B: b
C: c
D: d

So it doesn't really work on anything other than ASCII, it looks like.  
So I have two questions:

1. Does the use of the tolower() C function in the citext data type on  
pgfoundry basically give me the same results as using lower() in my  
SQL has for all these years? IOW, does it convert letters to lowercase  
in the same way that the LOWER() SQL function does? If so, I think I  
might start to use it for my case-insensitive columns and simplify my  
SQL a bit.

   http://pgfoundry.org/projects/citext/

2. Isn't the ICU library distributed with PostgreSQL? And if so, could  
it not be used to create proper case conversions in LOWER() and  
friends and, ultimately, to create a case-insensitive text type in  
core? I'm seeing that it has a constant named U_COMPARE_IGNORE_CASE  
that can be used with its unorm_compare() function:

   http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5
   http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437

I don't really know C, but if that's stuff there, can't we take  
advantage of it for proper case-insensitive comparisons (and  
conversions)?

Thanks,

David


Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2008-06-02 04:08:35
Subject: Re: Case-Insensitve Text Comparison
Previous:From: Joe ConwayDate: 2008-06-02 00:14:13
Subject: Re: Add dblink function to check if a named connection exists

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group