Re: Regexps vs. locale

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Regexps vs. locale
Date: 2008-12-08 17:39:31
Message-ID: 87vdtuo9bg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>> Obviously, this happens because the locale support functions in
>> backend/regex/regc_locale.c are (presumably intentionally)
>> crippled so as not to support non-ascii chars, despite all the
>> code there using wide chars for everything otherwise.

Tom> It's not so much intentional as that no one has gotten around to
Tom> making it work. The difficulty is that the wide-char codes we
Tom> are using might not match what the <wctype.h> functions expect,
Tom> and it's unclear what we could do to fix that.

Couldn't we follow the example of lower(), and convert the string to
wchar_t using mbstowcs (rather than pg_wchar_t and pg_mb2wchar)?

This obviously requires that we have a matching lc_ctype for the
encoding, but we insist on that now anyway, no?

--
Andrew.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-12-08 18:15:28 Re: cvs head initdb hangs on unixware
Previous Message Robert Haas 2008-12-08 17:02:14 Re: benchmarking the query planner (was Re: Simple postgresql.conf wizard)