Re: UTF-8 and Regular expression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Håvard Wahl Kongsgård <haavard(dot)kongsgaard(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: UTF-8 and Regular expression
Date: 2011-05-31 21:26:21
Message-ID: 911.1306877181@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

=?ISO-8859-1?Q?H=E5vard_Wahl_Kongsg=E5rd?= <haavard(dot)kongsgaard(at)gmail(dot)com> writes:
> Hi, in 8.4 how does the regular expression functions in postgresql handle
> special UTF-8 characters?

Badly :-(

> for example:
> SELECT name,substring(name from E'\\w+\\s(\\w+)$') from nodes;
> fails to select characters like

Should work in 9.0, but no chance in earlier releases. You need to use
a single-byte encoding such as LATIN1 if you need to do this in older
releases. In any release, make sure you're using an LC_COLLATE setting
that's appropriate for the language and encoding.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2011-05-31 21:42:27 Re: Function Column Expansion Causes Inserts To Fail
Previous Message Pete Chown 2011-05-31 20:52:04 Consistency of distributed transactions