On Fri, Jan 20, 2012 at 10:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The code I've written so far does no canonicalization of the input
>> value of any kind, just as we do for XML.
> Fair enough.
>> So, given that framework, what the patch does is this: if you're using
>> UTF-8, then \uXXXX is accepted, provided that XXXX is something that
>> equates to a legal Unicode code point. It isn't converted to the
>> corresponding character: it's just validated. If you're NOT using
>> UTF-8, then it allows \uXXXX for code points up through 127 (which we
>> assume are the same in all encodings) and anything higher than that is
> This seems a bit silly. If you're going to leave the escape sequence as
> ASCII, then why not just validate that it names a legal Unicode code
> point and be done? There is no reason whatever that that behavior needs
> to depend on the database encoding.
Mostly because that would prevent us from adding canonicalization in
the future, AFAICS, and I don't want to back myself into a corner.
The Enterprise PostgreSQL Company
In response to
pgsql-hackers by date
|Next:||From: Robert Haas||Date: 2012-01-20 15:32:43|
|Subject: Re: CLOG contention, part 2|
|Previous:||From: Robert Haas||Date: 2012-01-20 15:30:33|
|Subject: Re: Group commit, revised|