Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: steven(at)trumpet(dot)io, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence
Date: 2010-08-19 23:29:57
Message-ID: 20100820.082957.113300986.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

> We generally assume that in server-safe encodings, the ctype.h functions
> will behave sanely on any single-byte value.

I think this "wisedom" is only true for C locale. I'm not surprised
all that it does not work with non C locales.

>From array_funcs.c:

while (isspace((unsigned char) *p))
p++;

IMO this should be something like:

while (isspace((unsigned char) *p))
p += pg_mblen(p);
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2010-08-20 01:43:02 Re: BUG #5305: Postgres service stops when closing Windows session
Previous Message Steven Schlansker 2010-08-19 22:54:36 Re: COPY FROM/TO losing a single byte of a multibyte UTF-8 sequence

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2010-08-19 23:37:49 Re: proposal: tuplestore, tuplesort aggregate functions
Previous Message Tom Lane 2010-08-19 23:06:14 Re: trace_recovery_messages