pgsql: Add caching of ctype.h/wctype.h results in regc_locale.c.

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Add caching of ctype.h/wctype.h results in regc_locale.c.
Date: 2012-02-20 02:02:12
Message-ID: E1RzIZw-0002tE-7c@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Add caching of ctype.h/wctype.h results in regc_locale.c.

While this doesn't save a huge amount of runtime, it still seems worth
doing, especially since I realized that the data copying I did in my first
draft was quite unnecessary. In this version, once we have the results
cached, getting them back for re-use is really very cheap.

Also, remove the hard-wired limitation to not consider wctype.h results for
character codes above 255. It turns out that we can't push the limit as
far up as I'd originally hoped, because the regex colormap code is not
efficient enough to cope very well with character classes containing many
thousand letters, which a Unicode locale is entirely capable of producing.
Still, we can push it up to U+7FF (which I chose as the limit of 2-byte
UTF8 characters), which will at least make Eastern Europeans happy pending
a better solution. Thus, this commit resolves the specific complaint in
bug #6457, but not the more general issue that letters of non-western
alphabets are mostly not recognized as matching [[:alpha:]].

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/e00f68e49c148851187136d3278b7e9afa370537

Modified Files
--------------
src/backend/regex/regc_locale.c | 119 +++++++-------------
src/backend/regex/regc_pg_locale.c | 222 +++++++++++++++++++++++++++++++++++-
2 files changed, 260 insertions(+), 81 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2012-02-20 05:53:39 pgsql: Fix regex back-references that are directly quantified with *.
Previous Message Tom Lane 2012-02-19 23:58:33 pgsql: Create the beginnings of internals documentation for the regex c