Re: Bug in UTF8-Validation Code?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Mario Weilguni <mweilguni(at)sime(dot)com>
Cc: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-03-13 14:12:55
Message-ID: 45F6B167.8070401@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mario Weilguni wrote:
> Am Dienstag, 13. März 2007 14:46 schrieb Albe Laurenz:
>
>> Mario Weilguni wrote:
>>
>>> Steps to reproduce:
>>> create database testdb with encoding='UTF8';
>>> \c testdb
>>> create table test(x text);
>>> insert into test values ('\244'); ==> Is akzepted, even if not UTF8.
>>>
>> This is working as expected, see the remark in
>> http://www.postgresql.org/docs/current/static/sql-syntax-lexical.html#SQ
>> L-SYNTAX-STRINGS
>>
>> "It is your responsibility that the byte sequences you create
>> are valid characters in the server character set encoding."
>>
>
> In that case, pg_dump is doing wrong here and should quote the output. IMO it
> cannot be defined as working as expected, when this makes any database dumps
> worthless, without any warnings at dump-time.
>
> pg_dump should output \244 itself in that case.
>
>

The sentence quoted from the docs is perhaps less than a model of
clarity. I would take it to mean that no client-encoding ->
server-encoding translation will take place. Does it really mean that
the server will happily accept any escaped byte sequence, whether or not
it is valid for the server encoding? If so that seems ... odd.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2007-03-13 14:21:37 Re: My honours project - databases using dynamically attached entity-properties
Previous Message Merlin Moncure 2007-03-13 13:59:32 Re: Major Feature Interactions