Re: Chars problem restoring to ps 8.4 (utf8) a dumped db from ps 8.1 (latin9)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martín Marqués <martin(dot)marques(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, Bianchi Quota Leonardo <leonardo(dot)bianchiquota(at)insiel(dot)it>, "'pgsql-general(at)postgresql(dot)org'" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Chars problem restoring to ps 8.4 (utf8) a dumped db from ps 8.1 (latin9)
Date: 2015-08-13 14:39:25
Message-ID: 20971.1439476765@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

"=?UTF-8?Q?Mart=c3=adn_Marqu=c3=a9s?=" <martin(dot)marques(at)gmail(dot)com> writes:
> El 12/08/15 a las 11:12, Tom Lane escribi:
>> It does not seem likely to me that this would work at all. You're taking
>> a dump file that is full of LATIN9 data and simply asserting that it's
>> UTF8 data. That doesn't make it so. If it seemed to work, maybe that's
>> because your editor changed the encoding? Not to be relied on, for sure.

> Well, IIRC a LATIN9 encoding char which is interpreted as UTF8 will get
> inserted with no error on a UTF8 server (although the final data will be
> bogus).

I'd believe the other way around: if you tell the database that you're
using LATIN9, but what you send is really UTF8, it will not reject it
because the individual bytes are perfectly valid LATIN9 characters and
there are no cross-byte checks to make in LATIN9. But it seems highly
unlikely that LATIN9-encoded data would get past the UTF8 validity
checker with any consistency.

It's possible that the problem is one of mislabeling, ie the database
was claimed to use LATIN9 but what was actually sent was always UTF8.
If that was *always* the case then the OP's fix of changing the label
in the dump file was actually the right thing to do. But we haven't
been given enough information to be sure of that --- and if that's
what was happening, then some client software fixes would be in order
anyway, because the client code was using the wrong client_encoding.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Edson Richter 2015-08-13 14:41:43 Re: Sync replication + high latency server
Previous Message John Turner 2015-08-13 14:28:36 Re: PostgreSQL - The Best Overall Database