From: | Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
---|---|
To: | "David E(dot) Wheeler" <david(at)kineticode(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Case-Insensitve Text Comparison |
Date: | 2008-06-02 05:21:47 |
Message-ID: | Pine.LNX.4.64.0806020919181.21547@sn.sai.msu.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
David,
we wrote contrib module (mchar) for one customer, which ports its
application from mssql to postgres. It does case-insensitive comparison
for new data type 'mchar' and linked with ICU for system independent locale.
Oleg
On Sun, 1 Jun 2008, David E. Wheeler wrote:
> Howdy,
>
> I'm sure I'm just showing off my ignorance here, but here goes
>
> I really need case-insensitive string comparison in my database. Ideally
> there'd be a nice ITEXT data type (and friends, ichar, ivarchar, etc.). But
> of course there isn't, and for years I've just used LOWER() on indexes and
> queries to get the same result.
>
> Only it turns out that I'm of course not getting the same result. This
> script:
>
> #!/usr/local/bin/perl -w
>
> use strict;
> use warnings;
> use utf8;
> binmode STDOUT, ':utf8';
> use DBI;
>
> my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '', { pg_enable_utf8
> => 1 });
> for my $char qw( A B C D ) {
> print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef, $char ),
> $/;
> }
>
> Yields this output:
>
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> :
> A: a
> B: b
> C: c
> D: d
>
> So it doesn't really work on anything other than ASCII, it looks like. So I
> have two questions:
>
> 1. Does the use of the tolower() C function in the citext data type on
> pgfoundry basically give me the same results as using lower() in my SQL has
> for all these years? IOW, does it convert letters to lowercase in the same
> way that the LOWER() SQL function does? If so, I think I might start to use
> it for my case-insensitive columns and simplify my SQL a bit.
>
> http://pgfoundry.org/projects/citext/
>
> 2. Isn't the ICU library distributed with PostgreSQL? And if so, could it not
> be used to create proper case conversions in LOWER() and friends and,
> ultimately, to create a case-insensitive text type in core? I'm seeing that
> it has a constant named U_COMPARE_IGNORE_CASE that can be used with its
> unorm_compare() function:
>
> http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5
> http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437
>
> I don't really know C, but if that's stuff there, can't we take advantage of
> it for proper case-insensitive comparisons (and conversions)?
>
> Thanks,
>
> David
>
>
>
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru)
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2008-06-02 05:27:23 | Re: Overhauling GUCS |
Previous Message | Tom Lane | 2008-06-02 05:18:55 | Re: Case-Insensitve Text Comparison |