regexp_replace and UTF8

From: "Bart Degryse" <Bart(dot)Degryse(at)indicator(dot)be>
To: <pgsql-sql(at)postgresql(dot)org>
Subject: regexp_replace and UTF8
Date: 2009-01-30 10:26:48
Message-ID: 4982E3F7.A3DD.0030.0@indicator.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi,
I have a text field with data like this: 'de pati&#235;nt niet'
(without the quotes).
I would like to convert this string to look like this: 'de patiënt
niet'
Basically what I need to do (I think) is
- get rid of the &, # and ;
- convert the number to hex
- make a UTF8 from that (thus: \xEB)
- convert that to SQL_ASCII
Since this works:
select regexp_replace('de pati&#235;nt niet', '&#(\\d+);' (
file://d+);'/ ), '\xEB', 'g')
I was thinking that this would work too, but it doesn't
select regexp_replace('de pati&#235;nt niet', '&#(\\d+);' (
file://\d+);' ), '\x' || to_hex(E'\\1'), 'g')
It gives me:
ERROR: function to_hex("unknown") is not unique
HINT: Could not choose a best candidate function. You may need to
add explicit type casts.
So I changed it to
select regexp_replace('de pati&#235;nt niet', '&#(\\d+);' (
file://\d+);' ), '\x' || to_hex(CAST (to_number(E'\\1','999') AS
INTEGER)), 'g')
which kind of works except that the result is not what I need:
'de patix1nt niet'
Can anyone help me fix this or point me to a better approach.
By the way, changing the way data is put into the field is
unfortunately not an option.
Many thanks in advance.

ICT Departement - Indicator NV
Bart Degryse

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Gerardo Herzig 2009-01-30 11:01:24 dynamic OUT parameters?
Previous Message Terry Fielder 2009-01-29 15:16:28 Re: I need some magical advice