Skip site navigation (1) Skip section navigation (2)

Re: Bug in UTF8-Validation Code?

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: andrew(at)supernews(dot)com
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-04-02 20:35:20
Message-ID: 46116908.8040702@markdilger.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Andrew - Supernews wrote:
> On 2007-04-01, Mark Dilger <pgsql(at)markdilger(dot)com> wrote:
>> Do any of the string functions (see 
>> http://www.postgresql.org/docs/8.2/interactive/functions-string.html) run the 
>> risk of generating invalid utf8 encoded strings?  Do I need to add checks?
>> Are there known bugs with these functions in this regard?
> 
> The chr() function returns an octet, rather than a character; this is clearly
> wrong and needs fixing.
> 

Ok, I've altered the chr() function.  I am including a transcript from psql 
below.  There are several design concerns:

1) In the current implementation, chr(0) returns a 5-byte text object (4-bytes 
of overhead plus one byte of data) containing a null.  In the new 
implementation, this returns an error.  I don't know, but it is possible that 
people currently use things like "SELECT chr(0) || chr(0) || ..." to build up 
strings of nulls.

2) Under utf8, chr(X) fails for X = 128..255.  This may also break current users 
expectations.

3) The implicit modulus operation that was being performed by chr() is now gone, 
which might break some users.

4) You can't represent the high end of the astral plain with type INTEGER, 
unless you pass in a negative value, which is somewhat unintuitive.  Since chr() 
expects an integer (and not a bigint) the user needs handle the sign bit correctly.

mark

---------------------




Welcome to psql 8.3devel, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
        \h for help with SQL commands
        \? for help with psql commands
        \g or terminate with semicolon to execute query
        \q to quit

pgsql=# select chr(0);
ERROR:  character 0x00 of encoding "SQL_ASCII" has no equivalent in "UTF8"
pgsql=# select chr(65);
  chr
-----
  A
(1 row)

pgsql=# select chr(128);
ERROR:  character 0x80 of encoding "SQL_ASCII" has no equivalent in "UTF8"
pgsql=# select chr(53398);
  chr
-----
  Ж
(1 row)

pgsql=# select chr(14989485);
  chr
-----
  中
(1 row)

pgsql=# select chr(4036005254);
ERROR:  function chr(bigint) does not exist
LINE 1: select chr(4036005254);
                ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.

In response to

Responses

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2007-04-02 20:35:47
Subject: Re: Dead Space Map version 3 (simplified)
Previous:From: Tom LaneDate: 2007-04-02 20:33:46
Subject: Re: Is this portable?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group