Re: Locale/encoding problem/question

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: henka(at)cityweb(dot)co(dot)za
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: Locale/encoding problem/question
Date: 2006-08-04 12:59:51
Message-ID: 5345.1154696391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

henka(at)cityweb(dot)co(dot)za writes:
>> It should be in the dump file, almost the first line. Locale is of no
>> interest to pg_dump, you'll have to decide how you want it.

> Yes: UTF-8 and the other is LATIN1

Note that this represents what the original server *thought* the
encoding was. But it's not at all impossible that the server thought
the data was LATIN1 when it was really UTF8. (The other way around is
less plausible because the server would have been able to detect
encoding errors.) If you were using clients that treated the data
as UTF8 without paying attention to what the server thought, you'd
not have realized you were mislabeling the data.

But, if you tried to load data marked as LATIN1 into a server using
UTF8, it'd have applied a LATIN1 to UTF8 conversion, and then
everything's hosed.

I'd suggest actually inspecting the data in the dump file: it's not that
hard to tell UTF8 from LATIN1 if you look at the byte sequences.

Or you could just take the file marked LATIN1, edit it to change the
client_encoding setting to say the data is UTF8, and see if you can
load it. If it's not UTF8, 8.1.4 will almost certainly detect a ton of
encoding errors.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Merlin Moncure 2006-08-04 13:29:50 Re: Best Procedural Language?
Previous Message Q Beukes 2006-08-04 12:33:24 pg_dump sequence problem