Re: convert text field to utf8 in sql_ascii database

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Andy Colson *EXTERN*" <andy(at)squeakycode(dot)net>, "PostgreSQL" <pgsql-general(at)postgresql(dot)org>
Subject: Re: convert text field to utf8 in sql_ascii database
Date: 2011-11-18 08:50:37
Message-ID: D960CB61B694CF459DCFB4B0128514C207201101@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Andy Colson wrote:
> I am in the middle of a process to get all my data into utf8. As its
> not all converted yet, my database encoding is SQL_ASCII.
>
> I am getting external apps fixed up to write utf8 to the database, and
> so far so good. But, I ran across some stuff that needs a one time
> convert, and wanted to just write sql to update it.
>
> I cant seem to figure it out. I'm on Slackware 64, PG 9.0.4.
>
> I tried:
> update general set legal = convert(legal, 'windows_1250', 'utf8');
>
> ERROR: function convert(text, unknown, unknown) does not exist
> HINT: No function matches the given name and argument types. You might
> need to add explicit type casts.
>
>
> It looks like convert() only does bytea, and legal is defined as text.
> Can't seem to cast it either (as legal::bytea). The result is bytea, so
> again would have to cast/convert?

Right.

> Also, I'd like to add, I dont understand the online help:
> http://www.postgresql.org/docs/9.0/static/functions-string.html
>
> shows convert(string, src, dest), but then says see Table 9-7, which has
> conversion names, like windows_1250_to_utf8, where do I use that?

You can only convert from encoding A to encoding B if there is a
corresponding A_to_B conversion available.

> The help also shows functions convert_to() and convert_from(). But I
> dont understand how to use them.
>
> update general set legal = convert_to(legal, 'utf8');
>
> How does it know what source encoding to use? And it converts to bytea
> so how do I convert it back to text?

convert_to() and convert_from() can't do any conversion in an SQL_ASCII
database because there is no well-defined database encoding in that case.
They will "cast" between text and bytea but leave the contents unmodified.

You'll have to employ convert() to do the actual conversion.

If, for example, you know that a column is in WIN1252 encoding, you could
UPDATE tab SET col =
convert_from(
convert(
convert_to('schön', 'WIN1252'), 'WIN1252', 'UTF8'
), 'UTF8'
);

Yours,
Laurenz Albe

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Greg Smith 2011-11-18 10:39:46 Re: Performance degradation 8.4 -> 9.1
Previous Message John R Pierce 2011-11-18 07:30:41 Re: pg_dump error