Re: Case-Insensitve Text Comparison

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Case-Insensitve Text Comparison
Date: 2008-06-02 16:45:27
Message-ID: 172A4089-7D72-4AA3-8F3A-59BED6F92119@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun 2, 2008, at 09:33, Tom Lane wrote:

>> Would the use of str_tolower() in formatting.c fix that?
>
> Yeah, you need something equivalent to that. I think that whole area
> is due for refactoring, though --- we've got kind of a weird
> collection
> of upper/lower/initcap APIs spread through a couple of different
> files.

And I just ran into this on 8.3 when trying to install citext:

psql:citext.sql:350: ERROR: there is no built-in function named
"oid_text"

I'm assuming that this is because a lot of automatic casts were
removed in 8.3 or 8.2; There are a bunch of these:

CREATE FUNCTION citext(oid) RETURNS citext AS 'oid_text' LANGUAGE
'internal' IMMUTABLE STRICT;
GRANT EXECUTE ON FUNCTION citext(oid) TO PUBLIC;
COMMENT ON FUNCTION citext(oid) IS 'convert oid to citext';
CREATE FUNCTION oid(citext) RETURNS oid AS 'text_oid' LANGUAGE
'internal' IMMUTABLE STRICT;
GRANT EXECUTE ON FUNCTION oid(citext) TO PUBLIC;
COMMENT ON FUNCTION oid(citext) IS 'convert citext to oid';
CREATE CAST (citext AS oid) WITH FUNCTION oid(citext);
CREATE CAST (oid AS citext) WITH FUNCTION citext(oid);

CREATE FUNCTION citext(int2) RETURNS citext AS 'int2_text' LANGUAGE
'internal' IMMUTABLE STRICT;
GRANT EXECUTE ON FUNCTION citext(int2) TO PUBLIC;
COMMENT ON FUNCTION citext(int2) IS 'convert int2 to citext';
CREATE FUNCTION int2(citext) RETURNS int2 AS 'text_int2' LANGUAGE
'internal' IMMUTABLE STRICT;
GRANT EXECUTE ON FUNCTION int2(citext) TO PUBLIC;
COMMENT ON FUNCTION int2(citext) IS 'convert citext to int2';
CREATE CAST (citext AS int2) WITH FUNCTION int2(citext);
CREATE CAST (int2 AS citext) WITH FUNCTION citext(int2);

And on and on. Clearly this module needs updating for newer
PostgreSQLs. I tried removing them all in order to get the data type
and tried it out with this script:

my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '',
{ pg_enable_utf8 => 1 });
for my $char qw( À Á Â Ã Ä Å Ç Ć Č Ĉ Ċ Ď Đ A B C D ) {
print "$char: ", $dbh->selectrow_array('SELECT LOWER(?::citext)',
undef, $char ), $/;
}

Naturally it didn't work:

À: Ã
Á: á
Â: â
Ã: ã
Ä: ä
Å: Ã¥
Ç: ç
Ć: ć
Č: č
Ĉ: ĉ
Ċ: ċ
Ď: ď
Đ: đ
A: a
B: b
C: c
D: d

BTW, I rebuilt my cluster with --locale en_US.UTF-8 and the script
works on a text type, so having a locale is key.

Thanks,

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-06-02 16:46:44 Re: Proposal: new function array_init
Previous Message Tom Lane 2008-06-02 16:33:30 Re: Case-Insensitve Text Comparison