Re: [rfc] unicode escapes for extended strings

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Marko Kreen <markokr(at)gmail(dot)com>
Cc: "tomas(at)tuxteam(dot)de" <tomas(at)tuxteam(dot)de>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [rfc] unicode escapes for extended strings
Date: 2009-09-25 12:37:50
Message-ID: 4ABCB99E.6@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marko Kreen wrote:
> On 9/25/09, tomas(at)tuxteam(dot)de <tomas(at)tuxteam(dot)de> wrote:
>
>> On Thu, Sep 24, 2009 at 09:42:32PM +0300, Peter Eisentraut wrote:
>> > Good idea. This could also check for other invalid things like
>> > byte-order marks in UTF-8.
>>
>> But watch out. Microsoft apps do like to insert a BOM at the beginning
>> of the text. Not that I think it's a good idea, but the Unicode folks
>> seem to think its OK [1] :-(
>>
>
> As BOM does not actively break transport layers, it's less clear-cut
> whether to reject it. It could be said that BOM at the start of string
> is OK. BOM at the middle of string is more rejectable. But it will
> only confuse some high-level character counters, not low-level encoders.
>
>

It seems pretty clear from the URL that Tomas posted that we should not
treat a BOM specially at all, and just treat it as another Unicode char.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-09-25 12:50:52 Re: Hot Standby 0.2.1
Previous Message Heikki Linnakangas 2009-09-25 11:00:52 Re: Hot Standby 0.2.1