Quick Links

Re: PostgreSQL fails to convert decomposed utf-8 to other encodings

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: PostgreSQL fails to convert decomposed utf-8 to other encodings
Date:	2014-08-06 04:12:05
Message-ID:	53E1AB15.8050702@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On 08/06/2014 11:54 AM, Craig Ringer wrote:
> On 08/06/2014 09:14 AM, Tom Lane wrote:
>> We don't actually support "decomposed" utf8; if there is any bug here,
>> it's that the input you show isn't rejected. But I think there was
>> some intentional choice to not check \u escapes fully.
>
> Combining characters (i.e. decomposed utf-8 form, for chars where there
> is a combined equivalent) are part of utf-8. They're not an optional add-on.

... though we can advertise partial Unicode support, saying that we
support UTF-8 for UCS (ISO 10646-1:2000 Annex D / RFC 3629)
implementation level 1 only, requiring Normalization Form C (NFC) input.

Given that Pg doesn't seem to understand \xf8 or \xfc utf-8 chars, so it
doesn't cover the full utf-8 range, it doesn't look like it meets Level
1 either. So it supports "mostly-utf8".

With level 1 we should really _reject_ combining chars, but can't do
that w/o breaking BC.

I guess I should turn this:

http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-test.txt

into a regression test.

Possibly also parts of this:

http://www.columbia.edu/~fdc/utf8/

though it's more oriented toward rendering.

It's worth noting that Konsole and Thunderbird had no issues with
combining chars when I was testing this.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Re: PostgreSQL fails to convert decomposed utf-8 to other encodings at 2014-08-06 03:54:32 from Craig Ringer

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Tatsuo Ishii	2014-08-06 04:37:28	Re: PostgreSQL fails to convert decomposed utf-8 to other encodings
Previous Message	Craig Ringer	2014-08-06 03:54:32	Re: PostgreSQL fails to convert decomposed utf-8 to other encodings