Re: A thought about regex versus multibyte character sets

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: A thought about regex versus multibyte character sets
Date: 2009-12-01 21:46:11
Message-ID: 20091201214611.GH5013@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:

> I just spent a bit of time considering what we might do to fix this.
> The idea mentioned in the above thread was to switch over to using
> wchar_t in the regex code, but that seems to have a number of problems.
> One showstopper is that on some platforms wchar_t is only 16 bits and
> can't represent the full range of Unicode characters. I don't want to
> fix case-folding only to break regexes for other uses.

We have a TODO item about having a regex specific data type. Would
implementing that solve this problem?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-12-01 21:49:06 Re: Block-level CRC checks
Previous Message Marko Kreen 2009-12-01 21:30:53 Re: Application name patch - v4