Quick Links

Re: Bug in UTF8-Validation Code?

From:	Mark Dilger <pgsql(at)markdilger(dot)com>
To:	Mark Dilger <pgsql(at)markdilger(dot)com>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>
Subject:	Re: Bug in UTF8-Validation Code?
Date:	2007-04-01 02:47:21
Message-ID:	460F1D39.8010709@markdilger.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Mark Dilger wrote:
> Bruce Momjian wrote:
>> Added to TODO:
>>
>> * Fix cases where invalid byte encodings are accepted by the
>> database,
>> but throw an error on SELECT
>>
>> http://archives.postgresql.org/pgsql-hackers/2007-03/msg00767.php
>>
>> Is anyone working on fixing this bug?
>
> Hi, has anyone volunteered to fix this bug? I did not see any reply on
> the mailing list to your question above.
>
> mark

OK, I can take a stab at fixing this. I'd like to state some assumptions so
people can comment and reply:

I assume that I need to fix *all* cases where invalid byte encodings get into
the database through functions shipped in the core distribution.

I assume I do not need to worry about people getting bad data into the system
through their own database extensions.

I assume that the COPY problem discussed up-thread goes away once you eliminate
all the paths by which bad data can get into the system. However, existing
database installations with bad data already loaded will not be magically fixed
with these code patches.

Do any of the string functions (see
http://www.postgresql.org/docs/8.2/interactive/functions-string.html) run the
risk of generating invalid utf8 encoded strings? Do I need to add checks? Are
there known bugs with these functions in this regard?

If not, I assume I can add mbverify calls to the various input routines (textin,
varcharin, etc) where invalid utf8 could otherwise enter the system.

I assume that this work can be limited to HEAD and that I don't need to
back-patch it. (I suspect this assumption is a contentious one.)

Advice and comments are welcome,

mark

In response to

Re: Bug in UTF8-Validation Code? at 2007-04-01 00:04:01 from Mark Dilger

Responses

Re: Bug in UTF8-Validation Code? at 2007-04-01 10:30:51 from Martijn van Oosterhout
Re: Bug in UTF8-Validation Code? at 2007-04-01 15:44:17 from Andrew - Supernews
Re: Bug in UTF8-Validation Code? at 2007-06-13 23:35:55 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2007-04-01 06:52:49	Re: Last minute mini-proposal (I know, I know)forPQexecf()
Previous Message	Tom Lane	2007-04-01 02:31:26	Re: Macros for typtype (was Re: Arrays of Complex Types)