Re: Regexp match with accented character problem

From: Laslo Forro <getforum(at)gmail(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Regexp match with accented character problem
Date: 2010-06-08 11:59:27
Message-ID: AANLkTimDsYu1RPbNvksjm0Wh3jcj7JU4ocH3Ka_P4ggG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

Perhaps helps:

'ó' matches
\M
\M\M\M
\.*

but not \M\M\M\M or \M\M\M\W

These match:
E'\\mmacskacicó\M*'
E'\\mmacskacicó\s*'
E'\\mmacskacicó\W*'

with * quantifier. But not with + quantifier, or w/o any quantifier.
Also matches:

E'\\mmacskacicó\\Y' (!!!)
E'\\mmacskacicó$'

The text is typed via psql using urxvt terminal.
Perhaps some unicode - wide charater kind of mess?

On Tue, Jun 8, 2010 at 1:26 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:

> That might be a problem that 'ó' is not recognized as \w
> Actually I do not know which class 'ó' is in. It matches:
>
> test=# select * from texts where title ~* E'\\mmacskacic\\M';
> title | a_text
> --------------+----------------------------
> A macskacicó | A blah blah macskacicónak.
> (1 row)
>
> As if the end-of-word is at the last 'c' . ???
>
> If the hex. code of 'ó' is 97 (dec.151) could someone hint me how to insert
> it into the expression?
>
> On Tue, Jun 8, 2010 at 1:17 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>
>> Thanks a lot, anyway!
>>
>>
>> On Tue, Jun 8, 2010 at 12:56 PM, Thom Brown <thombrown(at)gmail(dot)com> wrote:
>>
>>> On 8 June 2010 11:54, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>>> > test=# \l
>>> > List of databases
>>> > Name | Owner | Encoding | Collation | Ctype | Access
>>> > privileges
>>> >
>>> -----------+----------+----------+-------------+-------------+-----------------------
>>> > postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>> > template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>> =c/postgres
>>> > :
>>> > postgres=CTc/postgres
>>> > template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>> =c/postgres
>>> > :
>>> > postgres=CTc/postgres
>>> > test | salmonix | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>> > (5 rows)
>>> >
>>>
>>> Okay, I'm not sure what the problem is there then. :S Hopefully
>>> someone else can shed some light on it for you.
>>>
>>> Thom
>>>
>>
>>
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Laslo Forro 2010-06-08 12:28:57 Re: Regexp match with accented character problem
Previous Message Laslo Forro 2010-06-08 11:26:24 Re: Regexp match with accented character problem