Skip site navigation (1) Skip section navigation (2)

Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: albert(dot)cieszkowski(at)cc(dot)com(dot)pl
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6457: Regexp not processing word (with special characters on ends) correctly (UTF-8)
Date: 2012-02-14 18:28:11
Message-ID: 11041.1329244091@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-bugs
albert(dot)cieszkowski(at)cc(dot)com(dot)pl writes:
> peimp=> select 'winoujcie' ~* '\mwinoujcie\M';
>  ?column?
> ----------
>  f
> (1 row)

Oh, I see the reason for this: the code in cclass() in regc_locale.c
doesn't go further up than U+00FF, so no codes above that will be
thought to be letters (or members of any other character class).
Clearly we need to go further when we are dealing with UTF8.
I'm not sure what a sane limit would be though.

(It would be nice if there were a more efficient way to get this
information than laboriously iterating through all the possible
character codes.  It doesn't look like we're even trying to cache
the results, ick.)

			regards, tom lane

In response to

Responses

pgsql-bugs by date

Next:From: calestyoDate: 2012-02-14 21:09:24
Subject: BUG #6459: logging_collector=off but log_filename set inhibitslogoutpu
Previous:From: Kevin GrittnerDate: 2012-02-14 16:22:07
Subject: Re: BUG #6458: LIKE different to =

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group