Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT

From: Vlad Romascanu <vromascanu(at)accurev(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT
Date: 2011-03-10 14:46:55
Message-ID: AANLkTinZCsMhbjCgRnEX=-pVf_YwMncOWDbFMu-==Or9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Aha! :) Why haven't I used convert_from() -- my ultimate abominable
goal is to do an in-place migration of an SQL_ASCII database
(LC_TYPE/COLLATION="C") to UTF8 (LC_TYPE/COLLATION="C"), where the
string data in the SQL_ASCII database is in some encoding, say LATIN1,
and where the bulk of the data is ASCII-only (i.e. don't want to
"pg_dump -E sqlasciidb | pg_restore -d utf8db" when I can potentially
get away with a fraction of the I/O and disk space requirements.) So
the plan was to:

UPDATE tbl SET str=convert(str::bytea, 'LATIN1',
'UTF8')::my_varlena::text WHERE str::bytea<>convert(str::bytea,
'LATIN1', 'UTF8');
UPDATE pg_database SET encoding=pg_char_to_encoding('UTF8') WHERE
datname='sqlasciidb';
-- close all connections/backends connected to sqlasciidb
-- reopen connection
-- reindex

Of course, I could theoretically UPDATE pg_database first, restart the
backends, and then use convert_from(), but even though LC_TYPE="C" I
sort of preferred the strings to be UTF8-consistent prior to
restarting a backend on the UTF8'ed database.

So I guess the ultimate question therefore is: what is wrong, if
anything, with the above hack? :)

Thanks!
V.

On Wed, Mar 9, 2011 at 5:23 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> I think convert_from is a little more direct:
> convert_from(string bytea, src_encoding name)

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2011-03-10 14:52:58 Re: Using bytea field...
Previous Message mark 2011-03-10 14:37:03 sort mem: size in RAM vs size on Disk