From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Alvaro Herrera <alvherre(at)commandprompt(dot)com> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: A thought about regex versus multibyte character sets |
Date: | 2009-12-01 21:52:26 |
Message-ID: | 21003.1259704346@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> I just spent a bit of time considering what we might do to fix this.
>> The idea mentioned in the above thread was to switch over to using
>> wchar_t in the regex code, but that seems to have a number of problems.
>> One showstopper is that on some platforms wchar_t is only 16 bits and
>> can't represent the full range of Unicode characters. I don't want to
>> fix case-folding only to break regexes for other uses.
> We have a TODO item about having a regex specific data type. Would
> implementing that solve this problem?
No, not particularly --- the stumbling block here is really impedance
mismatch between our internal APIs and libc's standard locale support.
The TODO item that would fix it is implementing our own locale support;
but I ain't holding my breath for that one.
AFAIR the motivation for a regex data type was solely performance.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2009-12-01 21:56:49 | Re: Block-level CRC checks |
Previous Message | Bruce Momjian | 2009-12-01 21:49:06 | Re: Block-level CRC checks |