| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Hans-Jürgen Schönig <postgres(at)cybertec(dot)at> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at |
| Subject: | Re: Bug with UTF-8 character |
| Date: | 2006-05-26 14:33:59 |
| Message-ID: | 25791.1148654039@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
=?windows-1252?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres(at)cybertec(dot)at> writes:
> But the code does a check where the second character should not be
> greater than 0x9F, when first character is 0xED. This is not according
> to UTF-8 standard in RFC 3629.
Better read the RFC again: it says
UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
------------
The reason for the prohibition is explained as
The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding
form (as surrogate pairs) and do not directly represent characters.
I don't know anything about "surrogate pairs", but I am not about to
decide that we know more about this than the RFC authors do. If they
say it's invalid, it's invalid.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Martijn van Oosterhout | 2006-05-26 14:37:25 | Re: Bug with UTF-8 character |
| Previous Message | Andreas Pflug | 2006-05-26 14:17:08 | Re: XLogArchivingActive |