Re: invalid byte sequence

From: Maximilian Tyrtania <lists(at)contactking(dot)de>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence
Date: 2011-03-04 14:18:58
Message-ID: 39AF8376-3E92-4E65-8686-DFE8D0E706B4@contactking.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Am 04.03.2011 um 11:01 schrieb Craig Ringer:

> On 04/03/11 00:02, Maximilian Tyrtania wrote:
>> After upgrading to pg 9.0.3 (from 8.4.2) on my Mac OS 10.6.2 machine i find this in my log file (a lot):
>>
>> <postgres%192.168.254.210%2011-03-03 16:37:30 CET%22021>STATEMENT: SELECT pg_file_read('pg_log/postgresql-2011-03-03_000000.log', 250000, $
>> <postgres%192.168.254.210%2011-03-03 16:37:32 CET%22021>ERROR: invalid byte sequence for encoding "UTF8": 0xe3bc74
>
> The "0xe3bc74" looks like gibberish in any encoding I can think of.
> What's the input file?

We are talking about pg's own logfile here. I thought that was clear. Look at the file's name. Apparently some guy on the french pgAdmin list has the very same problem. I have no idea how "0xe3bc74" made it into the log file.

> Is it sanely encoded? Do you know what encoding
> it is in?

As i said, i initially set lc_messages to 'de_DE-UTF8', so i assume that's what the log file was in. I changed it to 'c' now.

> If you really want to be encoding-agnostic and you do not care if you
> get garbage data in your database that makes no sense and can never make
> any sense, then you must ensure that your database is in the "C" locale
> for LC_CTYPE and LC_COLLATE, and you must SET client_encoding =
> "SQL_ASCII" when reading the data.
>
> A suitable CREATE DATABASE command might be:
>
> CREATE DATABASE garbage
> TEMPLATE template0
> ENCODING 'SQL_ASCII' LC_COLLATE 'C' LC_CTYPE 'C';
>
> but I really don't think that's generally a good idea. Storing random
> crap in text fields will cause you pain later. Better to either convert
> the text to a sane encoding, store it as bytea if you want the raw
> bytes, or reject it.

I certainly don't want to be encoding agnostic. I just would like to be able to read my log file using PGAdmin, which i can't right now, because PGAdmin 1.12. chops off the content after the 1st character that doesn't match the encoding.

Best wishes,
Max

Maximilian Tyrtania Software-Entwicklung
Dessauer Str. 6-7
10969 Berlin
http://www.contactking.de

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Raghavendra 2011-03-04 14:31:37 Re: How to select a list of sequences?
Previous Message James B. Byrne 2011-03-04 14:18:49 Re: Screencasts for PostgreSQL