Skip site navigation (1) Skip section navigation (2)

Re: UTF16 surrogate pairs in UTF8 encoding

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UTF16 surrogate pairs in UTF8 encoding
Date: 2011-02-20 00:00:30
Message-ID: 201102200000.p1K00Ui04261@momjian.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Marko Kreen wrote:
> On 9/8/10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Marko Kreen <markokr(at)gmail(dot)com> writes:
> >  > Although it does seem unnecessary.
> >
> >
> > The reason I asked for this to be spelled out is that ordinarily,
> >  a backslash escape \nnn is a very low-level thing that will insert
> >  exactly what you say.  To me it's quite unexpected that the system
> >  would editorialize on that to the extent of replacing two UTF16
> >  surrogate characters by a single code point.  That's necessary for
> >  correctness because our underlying storage is UTF8, but it's not
> >  obvious that it will happen.  (As a counterexample, if our underlying
> >  storage were UTF16, then very different things would need to happen
> >  for the exact same SQL input.)
> >
> >  I think a lot of people will have this same question when reading
> >  this para, which is why I asked for an explanation there.
> 
> Ok, but I still don't like the "when"s.  How about:
> 
> -    6-digit form technically makes this unnecessary.  (When surrogate
> -    pairs are used when the server encoding is <literal>UTF8</>, they
> -    are first combined into a single code point that is then encoded
> -    in UTF-8.)
> +    6-digit form technically makes this unnecessary.  (Surrogate
> +    pairs are not stored directly, but combined into a single
> +    code point that is then encoded in UTF-8.)

Applied, thanks.

-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

In response to

pgsql-hackers by date

Next:From: Tom LaneDate: 2011-02-20 00:01:55
Subject: FDW API: don't like the EXPLAIN mechanism
Previous:From: Jaime CasanovaDate: 2011-02-19 23:33:44
Subject: Re: Sync Rep v17

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group