Quick Links

Re: Bug with UTF-8 character

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
Cc:	pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at
Subject:	Re: Bug with UTF-8 character
Date:	2006-05-26 14:33:59
Message-ID:	25791.1148654039@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

=?windows-1252?Q?Hans-J=FCrgen_Sch=F6nig?= <postgres(at)cybertec(dot)at> writes:
> But the code does a check where the second character should not be
> greater than 0x9F, when first character is 0xED. This is not according
> to UTF-8 standard in RFC 3629.

Better read the RFC again: it says

UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
%xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
------------

The reason for the prohibition is explained as

The definition of UTF-8 prohibits encoding character numbers between
U+D800 and U+DFFF, which are reserved for use with the UTF-16 encoding
form (as surrogate pairs) and do not directly represent characters.

I don't know anything about "surrogate pairs", but I am not about to
decide that we know more about this than the RFC authors do. If they
say it's invalid, it's invalid.

regards, tom lane

In response to

Bug with UTF-8 character at 2006-05-26 06:21:56 from Hans-Jürgen Schönig

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Martijn van Oosterhout	2006-05-26 14:37:25	Re: Bug with UTF-8 character
Previous Message	Andreas Pflug	2006-05-26 14:17:08	Re: XLogArchivingActive