| From: | PFC <lists(at)boutiquenumerique(dot)com> | 
|---|---|
| To: | "Rajesh Mallah" <mallah_rajesh(at)yahoo(dot)com>, pgsql-sql(at)postgresql(dot)org | 
| Subject: | Re: Significance of Database Encoding | 
| Date: | 2005-05-15 19:48:47 | 
| Message-ID: | op.sqt1blpqth1vuj@localhost | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-sql | 
> $ iconv -f US-ASCII -t UTF-8  < test.sql > out.sql
> iconv: illegal input sequence at position 114500
>
> Any ideas how the job can be accomplised reliably.
>
> Also my database may contain data in multiple encodings
> like WINDOWS-1251 and WINDOWS-1256 in various places
> as data has been inserted by different peoples using
> different sources and client software.
You could use a simple program like that (in Python):
output = open( "unidump", "w" )
for line in open( "your dump" ):
	for encoding in "utf-8", "iso-8859-15", "whatever":
		try:
			output.write( unicode( line, encoding ).encode( "utf-8" ))
			break
		except UnicodeError:
			pass
	else:
		print "No suitable encoding for line..."
	I'd say this might work, if UTF-8 cannot absorb an apostrophe inside a  
multibit character. Can it ?
	Or you could do that to all your table using SELECTs but it's going to be  
painful...
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Rajesh Mallah | 2005-05-16 02:16:50 | Re: Significance of Database Encoding | 
| Previous Message | Rajesh Mallah | 2005-05-15 18:38:29 | Re: Significance of Database Encoding |