Skip site navigation (1) Skip section navigation (2)

Re: [v9.2] make_greater_string() does not return a string in some cases

From: horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp
To: robertmhaas(at)gmail(dot)com
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-10-29 17:16:03
Message-ID: 20111030.021603.01379645.horiguchi.kyotaro@horiguti.oss.ntt.co.jp (view raw or flat)
Thread:
Lists: pgsql-bugspgsql-hackers
Hello, I feel at a loss what to do...

> I thought that code was looking for 0xED/0xF4 in the second position,
> but it's actually looking for them in the first position, which makes
> vastly more sense.  Whee!

Anyway, I try to describe another aspect of this code a the present.

The switch block in the g_utf8_increnet is a folded code of five
individual manipulation according to the byte-length of the
sequence. The separation presupposes the input bytes and length
formes a valid utf-8 sequence.

For a character more than 5 byte length, retunes false.

For 4 bytes, the sequence ranges between U+10000 and U+1fffff.

  If charptr[3] is less than 0xbf, increment it and return true.

  Else assign 0x80 to charptr[3] and then if charptr[2] is less
  than 0xbf increment it and return true.

  Else assign 0x80 to charptr[2] and then,
    if (charptr[1] is less than 0x8f when charptr[0] == 0xf4) or
       (charptr[1] is less than 0xbf when charptr[0] != 0xf4)
      increment it and return true.

  Else assign 0x80 to charptr[1] and then if charptr[0] is not
  0xf4 increment it and return true.

  Else the input sequence must be 0xf4 0x8f 0xbf 0xbf which
  represents U+10ffff and this is the upper limit of UTF-8
  representation. Restore the sequnce and return false.


for 3 bytes, the sequence ranges between u+800 and u+ffff.

  If charptr[2] is less than 0xbf increment it and reutrn true.

  Else assign 0x80 to charptr[2] and then,
    if (charptr[1] is less than 0x9f when charptr[0] == 0xed) or
       (charptr[1] is less than 0xbf when charptr[0] != 0xed) 
      increment it and return true.

    The sequence 0xed 0x9f 0xbf represents U+d7ff will
    incremented to 0xef 0x80 0x80 (U+f000) at the end.

  Else assign 0x80 to charptr[1] and then if charptr[0] is not
  0xef increment it and return true.

  Else the input sequence must be 0xef 0xbf 0xbf which represents
  U+ffff and the next UTF8 sequence has the length of 4. Restore
  the sequnce and return false.


For 2 bytes, the sequence ranges between U+80 and U+7ff.

  If charptr[1] is less than 0xbf increment it and reutrn true.

  Else assign 0x80 to charptr[1] and then if charptr[0] is not
  0xdf increment it and return true.

  Else the input sequence must be 0xdf 0xbf which reporesents
  U+7ff and next UTF8 sequence has the length of 3.  Restore the
  sequence and return false.


For 1 byte, the byte ranges between U+0 and U+7f.

  If charptr[0] is less than 0x7f increment it and return true.

  Else the input sequence must be 0x7f which represents U+7f and
  next UTF8 sequence has the length of 2. Restore the sequence
  and return false.

-- 
Kyotaro Horiguchi



In response to

Responses

pgsql-hackers by date

Next:From: Robert HaasDate: 2011-10-29 18:26:28
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Previous:From: Jeff DavisDate: 2011-10-29 16:31:30
Subject: strange code in array_in

pgsql-bugs by date

Next:From: Robert HaasDate: 2011-10-29 18:26:28
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Previous:From: Tom LaneDate: 2011-10-29 16:17:15
Subject: Re: BUG #6277: Money datatype conversion wrong with Russian locale

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group