Re: Regexp match with accented character problem

From: Laslo Forro <getforum(at)gmail(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: Regexp match with accented character problem
Date: 2010-06-08 12:28:57
Message-ID: AANLkTiknHlEJT-tQIgHWYmF7zjZ5knoCUFXnVS8RezHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

more:
having the string 'macskacicóca' it matches:
\\mmacskacic\\Wca'
so it matches:
\\mmacskacic\\W'
indicating that 'ó' is a non alphanumeric character, but strange enough

but it doesn't:
\\mmacskacic\\W\\M
unless \M is with * quantifier.

Any idea or hint is highly appreciated.

Thanx in advance,
Laslo

On Tue, Jun 8, 2010 at 1:59 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:

> Perhaps helps:
>
> 'ó' matches
> \M
> \M\M\M
> \.*
>
> but not \M\M\M\M or \M\M\M\W
>
> These match:
> E'\\mmacskacicó\M*'
> E'\\mmacskacicó\s*'
> E'\\mmacskacicó\W*'
>
> with * quantifier. But not with + quantifier, or w/o any quantifier.
> Also matches:
>
> E'\\mmacskacicó\\Y' (!!!)
> E'\\mmacskacicó$'
>
> The text is typed via psql using urxvt terminal.
> Perhaps some unicode - wide charater kind of mess?
>
>
> On Tue, Jun 8, 2010 at 1:26 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>
>> That might be a problem that 'ó' is not recognized as \w
>> Actually I do not know which class 'ó' is in. It matches:
>>
>> test=# select * from texts where title ~* E'\\mmacskacic\\M';
>> title | a_text
>> --------------+----------------------------
>> A macskacicó | A blah blah macskacicónak.
>> (1 row)
>>
>> As if the end-of-word is at the last 'c' . ???
>>
>> If the hex. code of 'ó' is 97 (dec.151) could someone hint me how to
>> insert it into the expression?
>>
>> On Tue, Jun 8, 2010 at 1:17 PM, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>>
>>> Thanks a lot, anyway!
>>>
>>>
>>> On Tue, Jun 8, 2010 at 12:56 PM, Thom Brown <thombrown(at)gmail(dot)com> wrote:
>>>
>>>> On 8 June 2010 11:54, Laslo Forro <getforum(at)gmail(dot)com> wrote:
>>>> > test=# \l
>>>> > List of databases
>>>> > Name | Owner | Encoding | Collation | Ctype |
>>>> Access
>>>> > privileges
>>>> >
>>>> -----------+----------+----------+-------------+-------------+-----------------------
>>>> > postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>>> > template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>>> =c/postgres
>>>> > :
>>>> > postgres=CTc/postgres
>>>> > template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>>> =c/postgres
>>>> > :
>>>> > postgres=CTc/postgres
>>>> > test | salmonix | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
>>>> > (5 rows)
>>>> >
>>>>
>>>> Okay, I'm not sure what the problem is there then. :S Hopefully
>>>> someone else can shed some light on it for you.
>>>>
>>>> Thom
>>>>
>>>
>>>
>>
>

In response to

Responses

Browse pgsql-novice by date

  From Date Subject
Next Message Laslo Forro 2010-06-08 12:48:28 Re: Regexp match with accented character problem
Previous Message Laslo Forro 2010-06-08 11:59:27 Re: Regexp match with accented character problem