Re: Case-Insensitve Text Comparison

From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Case-Insensitve Text Comparison
Date: 2008-06-02 05:21:47
Message-ID: Pine.LNX.4.64.0806020919181.21547@sn.sai.msu.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David,

we wrote contrib module (mchar) for one customer, which ports its
application from mssql to postgres. It does case-insensitive comparison
for new data type 'mchar' and linked with ICU for system independent locale.

Oleg

On Sun, 1 Jun 2008, David E. Wheeler wrote:

> Howdy,
>
> I'm sure I'm just showing off my ignorance here, but here goes
>
> I really need case-insensitive string comparison in my database. Ideally
> there'd be a nice ITEXT data type (and friends, ichar, ivarchar, etc.). But
> of course there isn't, and for years I've just used LOWER() on indexes and
> queries to get the same result.
>
> Only it turns out that I'm of course not getting the same result. This
> script:
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use warnings;
> use utf8;
> binmode STDOUT, ':utf8';
> use DBI;
>
> my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '', { pg_enable_utf8
> => 1 });
> for my $char qw( A B C D ) {
> print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef, $char ),
> $/;
> }
>
> Yields this output:
>
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> A: a
> B: b
> C: c
> D: d
>
> So it doesn't really work on anything other than ASCII, it looks like. So I
> have two questions:
>
> 1. Does the use of the tolower() C function in the citext data type on
> pgfoundry basically give me the same results as using lower() in my SQL has
> for all these years? IOW, does it convert letters to lowercase in the same
> way that the LOWER() SQL function does? If so, I think I might start to use
> it for my case-insensitive columns and simplify my SQL a bit.
>
> http://pgfoundry.org/projects/citext/
>
> 2. Isn't the ICU library distributed with PostgreSQL? And if so, could it not
> be used to create proper case conversions in LOWER() and friends and,
> ultimately, to create a case-insensitive text type in core? I'm seeing that
> it has a constant named U_COMPARE_IGNORE_CASE that can be used with its
> unorm_compare() function:
>
> http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5
> http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437
>
> I don't really know C, but if that's stuff there, can't we take advantage of
> it for proper case-insensitive comparisons (and conversions)?
>
> Thanks,
>
> David
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-06-02 05:27:23 Re: Overhauling GUCS
Previous Message Tom Lane 2008-06-02 05:18:55 Re: Case-Insensitve Text Comparison