Quick Links

Re: UTF8 regexp and char classes still does not work

From:	Sergey Burladyan <eshkinkot(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: UTF8 regexp and char classes still does not work
Date:	2010-09-28 22:37:35
Message-ID:	8739sta40g.fsf@home.progtech.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Hmm, you're right. I only tested that on Latin1 characters, for which
> it does work because those have Unicode points below 256. I'm not
> sure of a reasonable solution for the general case --- we certainly
> don't want this function iterating up to 2^21 or thereabouts.

Yes, i understand this problem. How perl do this? May be this Unicode table can
be precomputed or linked to postgres binary from external source?

> Your test case seems to be using KOI8 encoding, though, which doesn't
> have anything to do with UTF8 behavior.

It's just for example of expected result. See first test, it is UTF8, two bytes per character:
> > --- CYRILLIC SMALL LETTER ZHE ~* CYRILLIC CAPITAL LETTER ZHE
> > select E'\320\266' ~* E'\320\226', E'\320\266' ~ '[[:alpha:]]+', 'g' ~ '[[:alpha:]]+';
> > ?column? | ?column? | ?column?
> > ----------+----------+----------
> > t | f | t

--
Sergey Burladyan

In response to

Re: UTF8 regexp and char classes still does not work at 2010-09-28 22:00:52 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Itagaki Takahiro	2010-09-28 23:51:54	Re: Help with User-defined function in PostgreSQL with Visual C++
Previous Message	Andrew Dunstan	2010-09-28 22:04:15	Re: Proposal: plpgsql - "for in array" statement