Re: Encoding-related errors when moving from 7.3 to 8.0.1

From: Carlos Moreno <moreno(at)mochima(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Encoding-related errors when moving from 7.3 to 8.0.1
Date: 2005-03-22 03:31:27
Message-ID: 423F918F.3030505@mochima.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


Thanks again, Alvaro!

Alvaro Herrera wrote:

>>So, our system (CGI's written in C++ running on a Linux server)
>>simply takes whatever the user gives (properly validated and
>>escaped) and throws it in the database. We've never encountered
>>any problem (well, or perhaps it's the opposite? Perhaps we've
>>always been living with the problem without realizing it?)
>
> The latter, I think. The problem is character recoding. If your old
> system has been running with encoding SQL_ASCII, then no recoding ever
> takes place. If you are now using UTF8 or latin1 (say) as server
> encoding, then as soon as the client is using a different encoding,
> there should be conversion in order to make the new data correct w.r.t.
> the server encoding. If the wrong conversion takes place, or if no
> conversion takes place, you may either end up with invalid data, or
> have the server reject your input (as was this case.)

This makes sense to me, yes. The reason why I'm a bit lost is that
we never did anything whatsoever with respect to encoding. Oddly
enough, I couldn't find much about this in the docs.

I see references to it in the runtime configuration docs (the part
where they describe the postgres.conf file). There's one line,
commented out, where they set (as an example), the client_encoding
to sql_ascii, and a comment to the end of that line says "actually,
it defaults to the server encoding").

I just found out that in the create database statement, one of the
options specifies the encoding using for that database.

I guess what changed from version 7.4.x to 8.0 is that the default
server_encoding changed?

This means that a temporary solution (or rather, a temporary patch)
would be to create the database specifying the right server_encoding
to match what I currently have on my system? (I wouldn't want to do
that if it is nothing more than a patch to keep hiding the problem)

> So the moral of the story seems to be that yes, you need to make each
> application issue the correct client_encoding before entering any data.
> You can attach it to the user or database, by issuing ALTER USER (resp.
> DATABASE). But if you are using a web interface, where the user can
> enter data in either win1252 or latin1 encoding (or whatever) depending
> on the environment, then I'm not sure what you should do.

This is indeed the case; and "do nothing" is what we have always
done with respect to this issue... Why has it been so long without
us realizing that there was a hidden problem, I really don't know.
(and we do have users with plenty of "weird" characters -- accent
aigue, grave, circumflex, tilde, dieresis, etc. -- and they have
always worked).

I'm so lost! :-(

BTW, the correct e-mail to pass through the anti-spam filter is
my first name, followed by a dot, followed by my last name (the
rest after the @ is the same)

Thanks again for your message!

Carlos
--

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Edwin New 2005-03-22 04:35:53 Clearing locks
Previous Message Qingqing Zhou 2005-03-22 01:27:42 Re: Tracking row updates