Re: regexp_replace and UTF8

From: "Bart Degryse" <Bart(dot)Degryse(at)indicator(dot)be>
To: <pgsql-sql(at)postgresql(dot)org>
Cc: "Jasen Betts" <jasen(at)xnet(dot)co(dot)nz>
Subject: Re: regexp_replace and UTF8
Date: 2009-02-02 08:38:09
Message-ID: 4986BF00.A3DD.0030.0@indicator.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Thanks for the ideas!
The function Jasen suggests works partially.
There are also entities like
&#8216;
&#8217;
&#8220;
&#8221;
&#8230;
The output of the htmlent function for these looks like |
Was that what you meant with "characters outside of the LATIN-1 space
are not handled
but apparently this is what you want." ?
Because in that case, they should be handled too.
How should that be done.
Thanks,
Bart

>>> Jasen Betts <jasen(at)xnet(dot)co(dot)nz> 2009-01-31 12:47 >>>
On 2009-01-30, Bart Degryse <Bart(dot)Degryse(at)indicator(dot)be> wrote:
>
> --=__Part8EA648F8.0__=
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: quoted-printable
>
> Hi,
> I have a text field with data like this: 'de pati&#235;nt niet'
> (without the quotes).
> I would like to convert this string to look like this: 'de patiënt
> niet'
> Basically what I need to do (I think) is
> - get rid of the &, # and ;
> - convert the number to hex
> - make a UTF8 from that (thus: \xEB)
that is not UTF8.

the UTF8 representation for 'ë' in SQL is e'\xC3\xAB' or chr(235)

your input appears to be encoded in LATIN-1 (or possibly 8859-13)

I think you'll need to write a function.

here, where I have database encoding UTF8 this appears to work as you
desire.

CREATE or replace FUNCTION htmlent(inp text) returns text as
$f$
DECLARE
str text;
BEGIN
str= regexp_replace(quote_literal( inp)
,$$&#(\d+);$$
,$$'||chr(\1)||'$$
,'g');
execute 'select '||str into str;
return str;
END
$f$
LANGUAGE PLPGSQL;

select htmlent('de pati&#235;nt niet');

probably the above should be expanded to handle named entities
like '&amp;' too.

characters outside of the LATIN-1 space are not handled
but aparently this is what you want.

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Nikhil teltia 2009-02-03 00:06:08 Function Returning a Set of Composite Value
Previous Message Pavel Stehule 2009-02-01 11:31:18 Re: dynamic OUT parameters?