From: | David Smith <srekcahlqsgp(at)jerusalem(dot)plus(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Regex code versus Unicode chars beyond codepoint 255 |
Date: | 2012-02-15 19:18:43 |
Message-ID: | Pine.LNX.4.44.1202152050100.2772-100000@localhost.localdomain |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
on 2010-11-24 at 15:56, Tom Lane wrote:
> Bug #5766 points out that we're still not there yet in terms of having
> sane behavior for locale-specific regex operations in Unicode
> encoding. The reason it's not working is that regc_locale does this to
> expand the set of characters that are considered to match [[:alnum:]]
> : <SNIP>
and it would appear that nobody answered the email.
I am currently implementing a library system that needs to search by
whole word. I am using \m...\M regexes, and the DB is utf8, which
includes text in Hebrew, Greek, Arabic and various European character
sets. I need a solution to do whole word searches on the data, and this
either means fixing the value of alnum for utf8 to include all character
sets, or manually generating a list of all characters and reimplementing
a word-start/end in regex myself. I would prefer to avoid the latter if
at all possible!
What is the current status regarding a full character list for alnum for
utf8, and is there anything I can do to help get it working?
Thanks,
David
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2012-02-15 19:22:17 | Re: pg_upgrade message |
Previous Message | Tom Lane | 2012-02-15 19:15:20 | Re: Assertion failure in AtCleanup_Portals |