Re: Bug in UTF8-Validation Code?

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Mark Dilger <pgsql(at)markdilger(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-06-13 23:35:55
Message-ID: 46707F5B.7080802@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


What is the state of play with this item? I think this is a must-fix bug
for 8.3. There was a flurry of messages back in April but since then I
don't recall seeing anything.

cheers

andrew

Mark Dilger wrote:
> Mark Dilger wrote:
>> Bruce Momjian wrote:
>>> Added to TODO:
>>>
>>> * Fix cases where invalid byte encodings are accepted by the
>>> database,
>>> but throw an error on SELECT
>>>
>>> http://archives.postgresql.org/pgsql-hackers/2007-03/msg00767.php
>>>
>>> Is anyone working on fixing this bug?
>>
>> Hi, has anyone volunteered to fix this bug? I did not see any reply
>> on the mailing list to your question above.
>>
>> mark
>
> OK, I can take a stab at fixing this. I'd like to state some
> assumptions so people can comment and reply:
>
> I assume that I need to fix *all* cases where invalid byte encodings
> get into the database through functions shipped in the core distribution.
>
> I assume I do not need to worry about people getting bad data into the
> system through their own database extensions.
>
> I assume that the COPY problem discussed up-thread goes away once you
> eliminate all the paths by which bad data can get into the system.
> However, existing database installations with bad data already loaded
> will not be magically fixed with these code patches.
>
> Do any of the string functions (see
> http://www.postgresql.org/docs/8.2/interactive/functions-string.html)
> run the risk of generating invalid utf8 encoded strings? Do I need to
> add checks? Are there known bugs with these functions in this regard?
>
> If not, I assume I can add mbverify calls to the various input
> routines (textin, varcharin, etc) where invalid utf8 could otherwise
> enter the system.
>
> I assume that this work can be limited to HEAD and that I don't need
> to back-patch it. (I suspect this assumption is a contentious one.)
>
> Advice and comments are welcome,
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2007-06-14 01:19:40 Re: Can autovac try to lock multiple tables at once?
Previous Message PFC 2007-06-13 22:09:02 Re: Controlling Load Distributed Checkpoints