Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13

From: Alexander Farber <alexander(dot)farber(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Date: 2013-03-19 15:10:46
Message-ID: CAADeyWjZUQU-mwN30rxZs_2A_HvBzrtWwd9kX+c+hmo5kfmN+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

I have prepared an SQL fiddle for my question:
http://sqlfiddle.com/#!11/8a494/4

And also described it in more detail at
http://stackoverflow.com/questions/15500270/string-matching-in-insert-trigger-how-to-use-in-conditionals-to-return-null

Does anybody please know how to check for
UTF8 range \x0410-\x042F in my code below?

I've tried both
new.word !~ '^[\x0410-\x042F]{2,}$'
(fails with syntax error) and
new.word !~ '^[\u0410-\u042F]{2,}$'
(triggers even for correct words):

create table good_words (
word varchar(64) primary key
);

create or replace function keep_clean() returns trigger as $body$
begin
new.word := upper(new.word);

/* next line does not compile? */
IF new.word !~ '^[\x0410-\x042F]{2,}$' THEN
RAISE EXCEPTION 'Not an uppercased Russian word in UTF8';
END IF;

IF new.word ~ '^[ЪЫЬ]' OR new.word ~ 'Ъ$' THEN
return NULL;
END IF;

/* does not return NULL for 'ошибббка'? */
IF new.word ~ '(.)\1\1' AND new.word NOT LIKE '%ШЕЕЕ%'
AND new.word NOT LIKE '%ЗМЕЕЕ%' THEN
return NULL;
END IF;

return new;
end;
$body$ language plpgsql;

Thank you
Alex

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Farber 2013-03-19 17:03:54 Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Previous Message Stephen Frost 2013-03-19 13:46:16 Re: Trust intermediate CA for client certificates