Skip site navigation (1) Skip section navigation (2)

Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

From: Duncan Rance <postgres(at)dunquino(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: albert(dot)cieszkowski(at)cc(dot)com(dot)pl, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
Date: 2012-02-15 09:21:13
Message-ID: 35CBD9EE-B188-4FD2-B1D6-2576B06D3BC4@dunquino.com (view raw or flat)
Thread:
Lists: pgsql-bugs
On 14 Feb 2012, at 18:28, Tom Lane wrote:
> 
> Oh, I see the reason for this: the code in cclass() in regc_locale.c
> doesn't go further up than U+00FF, so no codes above that will be
> thought to be letters (or members of any other character class).
> Clearly we need to go further when we are dealing with UTF8.
> I'm not sure what a sane limit would be though.

The Basic Multilingual Plane goes up to FFFF:

https://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Planes

In response to

pgsql-bugs by date

Next:From: FĂ©lix GERZAGUETDate: 2012-02-15 17:37:22
Subject: Re: BUG #6452: psql: can't change client encoding from the command line
Previous:From: Duncan RanceDate: 2012-02-15 09:18:56
Subject: Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group