Re: A thought about regex versus multibyte character sets

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: A thought about regex versus multibyte character sets
Date: 2009-12-01 03:13:10
Message-ID: 17821.1259637190@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> I therefore propose the following idea: if the database encoding is
> UTF8, allow the regc_locale.c functions to call the <wctype.h>
> functions, assuming that wchar_t and pg_wchar_t share the same
> representation. On platforms where wchar_t is only 16 bits, we can do
> this up to U+FFFF and be stupid about code points above that.

Or to be concrete, how about the attached? It seems to do what's
wanted, but I'm hardly the best-qualified person to test it.

regards, tom lane

Attachment Content-Type Size
utf8-regex-1.patch text/x-patch 6.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2009-12-01 03:16:23 Re: CommitFest status/management
Previous Message Bruce Momjian 2009-12-01 02:36:31 Re: ProcessUtility_hook