BUG #3697: utf8 issue: can not reimport a table that was successfully exported.

From: "Marc Mamin" <m(dot)mamin(at)intershop(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3697: utf8 issue: can not reimport a table that was successfully exported.
Date: 2007-10-25 14:49:16
Message-ID: 200710251449.l9PEnGkQ026857@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3697
Logged by: Marc Mamin
Email address: m(dot)mamin(at)intershop(dot)de
PostgreSQL version: 8.2.4
Operating system: SuSE Linux 9.1 (i586)
Description: utf8 issue: can not reimport a table that was
successfully exported.
Details:

Hello,

I'm not sure this is a bug; the problem might be related to a client
encoding issue.

My Database is on a Linux server which I connect to using putty from
Windows.

both server and client are set to UTF8:

client_encoding | UTF8
backslash_quote | safe_encoding
server_encoding | UTF8

I stumbled on this issue while trying to import a "malicious" user agent
string...

I didn't check if all characters are valid UTF8...

My concern is about database recovery.
I'm using pg_dump to regulary export my users, bu according to the example
below,
it seems that my dumps may be worthless !

May be you should consider not to publish this before a fix exist
as this is a serious issue which could eventually be exploited
to damage existing instances (for the case this is really a bug)...

regards,

Marc Mamin

steps to repeat:

CREATE TABLE utf8_test(s varchar);

CREATE OR REPLACE FUNCTION f_utf8_test( st VARCHAR) RETURNS INT AS $$

DECLARE
quotedline varchar = quote_literal($1);

BEGIN
INSERT INTO utf8_test ( s ) VALUES ( quotedline);
RETURN 0;
END;

$$ LANGUAGE plpgsql;

select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
\xE0\xF0\xF1\xF2\xE2\xE5\xED\xED\xFB\xE9 \xE2\xFB\xF1\xF8\9
\xE3\xEE\xF1\xF3\xE4
xE4\xE6 \xCD\xC1 \xD0\xC1")');

-- here the same statement, but with all backslashed duplicated for the case
when the string was modified when posting this issue:

select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind
\\xE0\\xF0\\xF1\\xF2\\xE2\\xE5\\xED\\xED\\xFB\\xE9 \\xE2\\xFB\\xF1\\xF8\\9
\\xE3\\xEE\\xF1\\xF3\\xE4
xE4\\xE6 \\xCD\\xC1 \\xD0\\xC1")');

WARNING: nonstandard use of escape in a string literal
LINE 1: select f_utf8_test('(Mozilla/4.0 (compatible; MSIE 6.0; Wind

COPY utf8_test TO '/tmp/utf8_test.txt';

COPY 1

COPY utf8_test FROM '/tmp/utf8_test.txt';

ERROR: invalid byte sequence for encoding "UTF8": 0xd3ce
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY utf8_test, line 1

The same isuue occure when using pg_dump:

pg_dump -i -v -p 5433 -Uisdb2 -tutf8_test > /tmp/utf8_dump

pg_dump: server version: 8.2.4; pg_dump version: 8.2.1
pg_dump: proceeding despite version mismatch


psql -f"/tmp/utf8_dump"

=>
psql:/tmp/utf8_dump:40: ERROR: invalid byte sequence for encoding "UTF8":
0xd3ce
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".
CONTEXT: COPY utf8_test, line 1

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Gergely Bor 2007-10-25 15:16:04 Re: Yet another problem with ILIKE and UTF-8
Previous Message Tom Lane 2007-10-25 14:36:14 Re: BUG #3695: Pgsql does not report non existing function