From: | Vlad Romascanu <vromascanu(at)accurev(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
Subject: | Re: Reinterpreting BYTEA as TEXT, converting BYTEA to TEXT |
Date: | 2011-03-10 14:46:55 |
Message-ID: | AANLkTinZCsMhbjCgRnEX=-pVf_YwMncOWDbFMu-==Or9@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Aha! :) Why haven't I used convert_from() -- my ultimate abominable
goal is to do an in-place migration of an SQL_ASCII database
(LC_TYPE/COLLATION="C") to UTF8 (LC_TYPE/COLLATION="C"), where the
string data in the SQL_ASCII database is in some encoding, say LATIN1,
and where the bulk of the data is ASCII-only (i.e. don't want to
"pg_dump -E sqlasciidb | pg_restore -d utf8db" when I can potentially
get away with a fraction of the I/O and disk space requirements.) So
the plan was to:
UPDATE tbl SET str=convert(str::bytea, 'LATIN1',
'UTF8')::my_varlena::text WHERE str::bytea<>convert(str::bytea,
'LATIN1', 'UTF8');
UPDATE pg_database SET encoding=pg_char_to_encoding('UTF8') WHERE
datname='sqlasciidb';
-- close all connections/backends connected to sqlasciidb
-- reopen connection
-- reindex
Of course, I could theoretically UPDATE pg_database first, restart the
backends, and then use convert_from(), but even though LC_TYPE="C" I
sort of preferred the strings to be UTF8-consistent prior to
restarting a backend on the UTF8'ed database.
So I guess the ultimate question therefore is: what is wrong, if
anything, with the above hack? :)
Thanks!
V.
On Wed, Mar 9, 2011 at 5:23 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> I think convert_from is a little more direct:
> convert_from(string bytea, src_encoding name)
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2011-03-10 14:52:58 | Re: Using bytea field... |
Previous Message | mark | 2011-03-10 14:37:03 | sort mem: size in RAM vs size on Disk |