Skip site navigation (1) Skip section navigation (2)

Re: Regexp match with accented character problem

From: Laslo Forro <getforum(at)gmail(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Regexp match with accented character problem
Date: 2010-06-08 12:48:28
Message-ID: AANLkTikUhCF0bU-gWO9T8H-TcSjdcBb2HCkN12qYf3iN@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-novice
And one more thing:

this all is strange.

test=# select * from text where a_text ~* E'\\mmacskacic\\W\\s';
    title     |          a_text
--------------+--------------------------
 A macskacicó | A bah macskacicóca
 A macskacicó | A bah macskacicó és a ló
(2 rows)

Strange, because I expect the 'macskacic+NON_WORD+WSPACE' pattern.
The corresponding perl regexp does not match:
macskacic\W\s

I am really lost.

And stop spamming.

On Tue, Jun 8, 2010 at 2:28 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:

> more:
> having the string 'macskacicóca' it matches:
> \\mmacskacic\\Wca'
> so it matches:
> \\mmacskacic\\W'
> indicating that 'ó' is a non alphanumeric character, but strange enough
>
> but it doesn't:
> \\mmacskacic\\W\\M
> unless \M is with * quantifier.
>
> Any idea or hint is highly appreciated.
>
> Thanx in advance,
> Laslo
>
> On Tue, Jun 8, 2010 at 1:59 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>
>> Perhaps helps:
>>
>> 'ó' matches
>> \M
>> \M\M\M
>> \.*
>>
>> but not \M\M\M\M or \M\M\M\W
>>
>> These match:
>> E'\\mmacskacicó\M*'
>> E'\\mmacskacicó\s*'
>> E'\\mmacskacicó\W*'
>>
>> with * quantifier. But not with + quantifier, or w/o any quantifier.
>> Also matches:
>>
>> E'\\mmacskacicó\\Y'     (!!!)
>> E'\\mmacskacicó$'
>>
>> The text is typed via psql using urxvt terminal.
>> Perhaps some unicode - wide charater kind of mess?
>>
>>
>> On Tue, Jun 8, 2010 at 1:26 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>>
>>> That might be a problem that 'ó' is not recognized as \w
>>> Actually I do not know which class 'ó' is in. It matches:
>>>
>>> test=# select * from texts where title ~* E'\\mmacskacic\\M';
>>>     title     |           a_text
>>> --------------+----------------------------
>>>  A macskacicó | A blah blah macskacicónak.
>>> (1 row)
>>>
>>> As if the end-of-word is at the last 'c' . ???
>>>
>>> If the hex. code of 'ó' is 97 (dec.151) could someone hint me how to
>>> insert it into the expression?
>>>
>>> On Tue, Jun 8, 2010 at 1:17 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>>>
>>>> Thanks a lot, anyway!
>>>>
>>>>
>>>> On Tue, Jun 8, 2010 at 12:56 PM, Thom Brown <thombrown(at)gmail(dot)com>wrote:
>>>>
>>>>> On 8 June 2010 11:54, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>>>>> > test=# \l
>>>>> >                                   List of databases
>>>>> >    Name    |  Owner   | Encoding |  Collation  |    Ctype    |
>>>>> Access
>>>>> > privileges
>>>>> >
>>>>> -----------+----------+----------+-------------+-------------+-----------------------
>>>>> >  postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
>>>>> >  template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
>>>>> =c/postgres
>>>>> >                                                              :
>>>>> > postgres=CTc/postgres
>>>>> >  template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
>>>>> =c/postgres
>>>>> >                                                              :
>>>>> > postgres=CTc/postgres
>>>>> >  test      | salmonix | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
>>>>> > (5 rows)
>>>>> >
>>>>>
>>>>> Okay, I'm not sure what the problem is there then. :S  Hopefully
>>>>> someone else can shed some light on it for you.
>>>>>
>>>>> Thom
>>>>>
>>>>
>>>>
>>>
>>
>

In response to

Responses

pgsql-novice by date

Next:From: Thom BrownDate: 2010-06-08 13:12:57
Subject: Re: Regexp match with accented character problem
Previous:From: Laslo ForroDate: 2010-06-08 12:28:57
Subject: Re: Regexp match with accented character problem

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group