From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Steven Schlansker <steven(at)trumpet(dot)io> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Date: | 2010-08-20 19:50:13 |
Message-ID: | 25852.1282333813@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
Steven Schlansker <steven(at)trumpet(dot)io> writes:
> On Aug 19, 2010, at 3:24 PM, Tom Lane wrote:
>> We generally assume that in server-safe encodings, the ctype.h functions
>> will behave sanely on any single-byte value. You can argue the wisdom
>> of that, but deciding to change that policy would be a rather massive
>> code change; I'm not excited about going that direction.
> Fair enough. I presume there are no "server-safe encodings" for which
> a multibyte sequence 0x XX20 would be valid - which would break anyway
> (as the second byte looks like a real space)
Right: our definition of a "server-safe encoding" is precisely that no
byte of a multibyte character looks like ASCII, ie all bytes have their
high bit set. We're essentially assuming that the <ctype.h> functions
will all return false for any byte with the high bit set, if the
selected encoding is multibyte.
> Anyway, it looks like this is actually a BSD bug which got copy +
> pasted into Apple's Darwin source -
> http://lists.freebsd.org/pipermail/freebsd-i18n/2007-September/000157.html
Interesting. So the BSD people did fix it upstream?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-08-21 14:12:27 | Re: BUG #5626: Parallel pg_restore fails with "tuple concurrently updated" |
Previous Message | Tom Lane | 2010-08-20 19:47:03 | Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2010-08-20 19:54:26 | Re: Version Numbering |
Previous Message | Tom Lane | 2010-08-20 19:47:03 | Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |