From: | Steven Schlansker <steven(at)trumpet(dot)io> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Date: | 2010-08-19 22:54:36 |
Message-ID: | 34C92DEC-CD89-403C-BB6D-B21012233F0F@trumpet.io |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Aug 19, 2010, at 3:24 PM, Tom Lane wrote:
> Steven Schlansker <steven(at)trumpet(dot)io> writes:
>>
>> I'm not at all experienced with character encodings so I could
>> be totally off base, but isn't it wrong to ever call isspace(0x85),
>> whatever the result may be, given that the actual character is 0xCF85?
>> (U+03C5, GREEK SMALL LETTER UPSILON)
>
> We generally assume that in server-safe encodings, the ctype.h functions
> will behave sanely on any single-byte value. You can argue the wisdom
> of that, but deciding to change that policy would be a rather massive
> code change; I'm not excited about going that direction.
Fair enough. I presume there are no "server-safe encodings" for which
a multibyte sequence 0x XX20 would be valid - which would break anyway
(as the second byte looks like a real space)
> You need a setlocale() call, else the program acts as though it's in C
> locale regardless of environment.
Sigh. I hate C sometimes. :-p
Anyway, it looks like this is actually a BSD bug which got copy +
pasted into Apple's Darwin source -
http://lists.freebsd.org/pipermail/freebsd-i18n/2007-September/000157.html
I have a couple of contacts at Apple so I'll see if there's any interest in
backporting a fix, but I wouldn't hope for it to happen quickly if at all...
Thanks for taking a look into fixing this, I hope you guys can reach
consensus on how to get it fixed :)
Best,
Steven Schlansker
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2010-08-19 23:29:57 | Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence |
Previous Message | Thue Janus Kristensen | 2010-08-19 22:46:52 | Re: BUG #5622: Query failed: server closed the connection unexpectedly |
From | Date | Subject | |
---|---|---|---|
Next Message | Quan Zongliang | 2010-08-19 23:01:54 | Re: Fw: patch for pg_ctl.c to add windows service start-type |
Previous Message | Josh Berkus | 2010-08-19 22:51:48 | Avoiding deadlocks ... |