Skip site navigation (1) Skip section navigation (2)

Re: Bug in UTF8-Validation Code?

From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: Mark Dilger <pgsql(at)markdilger(dot)com>
Cc: andrew(at)supernews(dot)com
Subject: Re: Bug in UTF8-Validation Code?
Date: 2007-04-02 20:50:36
Message-ID: 46116C9C.80300@markdilger.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Mark Dilger wrote:
> Andrew - Supernews wrote:
>> On 2007-04-01, Mark Dilger <pgsql(at)markdilger(dot)com> wrote:
>>> Do any of the string functions (see 
>>> http://www.postgresql.org/docs/8.2/interactive/functions-string.html) 
>>> run the risk of generating invalid utf8 encoded strings?  Do I need 
>>> to add checks?
>>> Are there known bugs with these functions in this regard?
>>
>> The chr() function returns an octet, rather than a character; this is 
>> clearly
>> wrong and needs fixing.
>>
> 
> Ok, I've altered the chr() function.  I am including a transcript from 
> psql below.  There are several design concerns:
> 
> 1) In the current implementation, chr(0) returns a 5-byte text object 
> (4-bytes of overhead plus one byte of data) containing a null.  In the 
> new implementation, this returns an error.  I don't know, but it is 
> possible that people currently use things like "SELECT chr(0) || chr(0) 
> || ..." to build up strings of nulls.
> 
> 2) Under utf8, chr(X) fails for X = 128..255.  This may also break 
> current users expectations.
> 
> 3) The implicit modulus operation that was being performed by chr() is 
> now gone, which might break some users.
> 
> 4) You can't represent the high end of the astral plain with type 
> INTEGER, unless you pass in a negative value, which is somewhat 
> unintuitive.  Since chr() expects an integer (and not a bigint) the user 
> needs handle the sign bit correctly.
> 
> mark
> 
> ---------------------
> 
> 
> 
> 
> Welcome to psql 8.3devel, the PostgreSQL interactive terminal.
> 
> Type:  \copyright for distribution terms
>        \h for help with SQL commands
>        \? for help with psql commands
>        \g or terminate with semicolon to execute query
>        \q to quit
> 
> pgsql=# select chr(0);
> ERROR:  character 0x00 of encoding "SQL_ASCII" has no equivalent in "UTF8"
> pgsql=# select chr(65);
>  chr
> -----
>  A
> (1 row)
> 
> pgsql=# select chr(128);
> ERROR:  character 0x80 of encoding "SQL_ASCII" has no equivalent in "UTF8"
> pgsql=# select chr(53398);
>  chr
> -----
>  Ж
> (1 row)
> 
> pgsql=# select chr(14989485);
>  chr
> -----
>  中
> (1 row)
> 
> pgsql=# select chr(4036005254);
> ERROR:  function chr(bigint) does not exist
> LINE 1: select chr(4036005254);
>                ^
> HINT:  No function matches the given name and argument types. You might 
> need to add explicit type casts.


Here's the code for the new chr() function:

Datum
chr(PG_FUNCTION_ARGS)
{
     int32       cvalue = PG_GETARG_INT32(0);
     text       *result;

     if (pg_database_encoding_max_length() > 1 && !lc_ctype_is_c())
     {
         int encoding,
             len,
             byteoff;
         uint32 buf[2];
         const char *bufptr;

         encoding = GetDatabaseEncoding();
         buf[0] = htonl(cvalue);
         buf[1] = 0;
         bufptr = (const char *)&buf;
         for (byteoff = 0; byteoff < sizeof(uint32) && 0 == *bufptr; ++byteoff, 
++bufptr);
         len = pg_encoding_mblen(encoding,bufptr);
         if (byteoff + len != sizeof(uint32) || !pg_verify_mbstr(encoding, 
bufptr, len, true /* noError */))
             report_untranslatable_char(PG_SQL_ASCII, encoding, bufptr, 
sizeof(int32));
         result = (text *) palloc(VARHDRSZ + len);
         SET_VARSIZE(result, VARHDRSZ + len);
         memcpy(VARDATA(result),bufptr,len);
     }
     else
     {
         result = (text *) palloc(VARHDRSZ + 1);
         SET_VARSIZE(result, VARHDRSZ + 1);
         *VARDATA(result) = (char) cvalue;
     }

     PG_RETURN_TEXT_P(result);
}

In response to

Responses

pgsql-hackers by date

Next:From: Heikki LinnakangasDate: 2007-04-02 20:50:49
Subject: Re: CheckpointStartLock starvation
Previous:From: Gregory StarkDate: 2007-04-02 20:43:00
Subject: Re: Is this portable?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group