Quick Links

Re: Significance of Database Encoding

From:	PFC <lists(at)boutiquenumerique(dot)com>
To:	"Rajesh Mallah" <mallah_rajesh(at)yahoo(dot)com>, pgsql-sql(at)postgresql(dot)org
Subject:	Re: Significance of Database Encoding
Date:	2005-05-15 19:48:47
Message-ID:	op.sqt1blpqth1vuj@localhost
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

> $ iconv -f US-ASCII -t UTF-8 < test.sql > out.sql
> iconv: illegal input sequence at position 114500
>
> Any ideas how the job can be accomplised reliably.
>
> Also my database may contain data in multiple encodings
> like WINDOWS-1251 and WINDOWS-1256 in various places
> as data has been inserted by different peoples using
> different sources and client software.

You could use a simple program like that (in Python):

output = open( "unidump", "w" )
for line in open( "your dump" ):
for encoding in "utf-8", "iso-8859-15", "whatever":
try:
output.write( unicode( line, encoding ).encode( "utf-8" ))
break
except UnicodeError:
pass
else:
print "No suitable encoding for line..."

I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a
multibit character. Can it ?

Or you could do that to all your table using SELECTs but it's going to be
painful...

In response to

Re: Significance of Database Encoding at 2005-05-15 18:38:29 from Rajesh Mallah

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Rajesh Mallah	2005-05-16 02:16:50	Re: Significance of Database Encoding
Previous Message	Rajesh Mallah	2005-05-15 18:38:29	Re: Significance of Database Encoding