Re: Bug in UTF8-Validation Code?

From: Michael Paesold <mpaesold(at)gmx(dot)at>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, Mario Weilguni *EXTERN* <mweilguni(at)sime(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-03-14 07:01:53
Message-ID: 45F79DE1.1070700@gmx.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Albe Laurenz wrote:
>> A fix could be either that the server checks escape sequences for
>> validity
>>
>
> This strikes me as essential. If the db has a certain encoding ISTM we
> are promising that all the text data is valid for that encoding.
>
> The question in my mind is how we help people to recover from the fact
> that we haven't done that.

I would also say that it's a bug that escape sequences can get characters
into the database that are not valid in the specified encoding. If you
compare the encoding to table constraints, there is no way to simply
"escape" a constraint check.

This seems to violate the principle of consistency in ACID. Additionally,
if you include pg_dump into ACID, it also violates durability, since it
cannot restore what it wrote itself.
Is there anything in the SQL spec that asks for such a behaviour? I guess not.

A DBA will usually not even learn about this issue until they are presented
with a failing restore.

Best Regards,
Michael Paesold

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Fuhr 2007-03-14 07:29:49 Re: Bug in UTF8-Validation Code?
Previous Message Greg Smith 2007-03-14 04:13:04 Re: Log levels for checkpoint/bgwriter monitoring