Skip site navigation (1) Skip section navigation (2)


From: David E(dot) Wheeler <david(at)kineticode(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PATCH: CITEXT 2.0
Date: 2008-06-28 01:22:25
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers

[N.B.: I tried to send this a while ago but it didn't get delivered,  
I'm assuming because, with the uncompressed patch, the email was too  
big for -hackers. So this is a re-send with the patch gzip'd. Sorry  
for any duplication].

Please find attached a patch adding a locale-aware, case-insensitive  
text type, called citext, as a contrib module. A few notes:

* I had originally called it lctext, as it's not a true case- 
insensitive type, but just converts strings to lowercase before  
comparing them. I changed it to citext at Tom Lane's suggestion, to  
ease compatibility for users of the original citext module on pgFoundry.

* Differences from the original citext are:
  + Locale-aware lowercasing of strings, rather than just lowercasing
    ASCII characters.
  + No implicit casts from text to citext except via assignment.
  + A few more functions overloaded
  + Works with 8.3 and CVS head

* Many thanks to whoever added str_tolower() to formatting.c. If I had  
known about that, I could have saved myself a lot of grief! My  
original implementation for 8.3.1 had copied a lot of code from  
oracle_compat.c to get things working. With this patch, I've  
eliminated a whole lot of code, as I can now just call str_tolower().  
So thank you for that! I'll probably keep my original in my personal  
Subversion repository, but don't now about releasing it if it will be  
accepted as a contrib module for 8.4.

* All comparisons simply convert the strings to be compared to  
lowercase using str_tolower(). I've made no other optimizations,  
though I'm sure someone with more experience with collations and such  
could add them.

* The regression test uses a new module I've created, now on  
pgFoundry, called pgtap. It should just work. sql/citext.sql adds  
plpgsql to the database and then includes pgtap.sql, which has the  
test functions in it.

* I wrote the tests assuming a collation of en_US.UTF-8. I expect it'd  
work with most West European languages, and maybe all languages other  
than the C locale, but I'm not sure. YMMV. If there's a way to  
generalize it and still be able to test the locale awareness, that  
would be great. What locales do the build farm servers use?

* In the documentation, I've pitched this type as a replacement for  
the use of LOWER() in ad-hoc queries, while also stipulating that this  
is not a "true" case-insensitive text type, and is furthermore less  
efficient than just TEXT (though I'm sure more efficient than ad-hock  
LOWER()s). I've also mentioned a few other caveats, including casts  
for TEXT that don't work for citext and non-case-insensitive matching  
in replace(), regexp_replace(), and a few others.

* I wrote all the code here myself, but of course used the original  
citext implementation (which is case-insensitive only for ASCII  
characters) for inspiration and guidance. Thanks to Donald Fraser for  
that original implementation.

I've compiled the CVS checkout, run its regressions, then built and  
installed the citext module (hence my discovery of the deprecation of  
wstring_lower and the addition of str_tolower -- should the  
declaration of the former be removed from formatting.c?), and all  
tests passed as of an hour ago.

I of course welcome feedback, advice, insults, commiserations, and  
just about any mode of comment on this patch. Please let me know if I  
need to provide any additional information.



Attachment: citext.patch.gz
Description: application/x-gzip (16.0 KB)


pgsql-hackers by date

Next:From: Tom LaneDate: 2008-06-28 02:34:26
Subject: Re: Vacuuming leaked temp tables (once again)
Previous:From: Michael PaesoldDate: 2008-06-28 00:32:50
Subject: Re: Vacuuming leaked temp tables (once again)

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group