Quick Links

Re: EOL characters and multibyte encodings

From:	"William ZHANG" <zedware(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: EOL characters and multibyte encodings
Date:	2007-06-22 08:33:22
Message-ID:	f5g1gh$1b7$1@news.hub.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

"Joe Conway" <mail(at)joeconway(dot)com>
> Tom Lane wrote:
>> Joe Conway <mail(at)joeconway(dot)com> writes:
>>> My first thought on fixing this issue was to simply replace all
>>> instances of '\r' in pg_proc.prosrc with '\n' prior to sending it to the
>>> R parser. As far as I know, any instances of '\r' embedded in a
>>> syntactically valid R statement must be escaped (i.e. literally the
>>> characters "\" and "r"), so that should not be a problem. But I am
>>> concerned about how this potentially plays against multibyte characters.
>>> Is it safe to do this, or do I need to use a mb-aware replace algorithm?
>>
>> It's safe, because you'll be dealing with prosrc inside the backend,
>> therefore using a backend-legal encoding, and those don't have any ASCII
>> aliasing problems (all bytes of an MB character must have high bit set).

The lower byte of some characters in BIG5, GBK, GB18030 may be less than
0x7F and don't have the high bit set. Fortunately, they don't use 0x0D and
0x0A (CR and LF).

Regards,
William ZHANG

> Great -- I wasn't sure about that.
>

In response to

Re: EOL characters and multibyte encodings at 2007-06-21 22:51:13 from Joe Conway

Responses

Re: EOL characters and multibyte encodings at 2007-06-22 12:11:53 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2007-06-22 08:49:10	Re: Worries about delayed-commit semantics
Previous Message	Zdenek Kotala	2007-06-22 08:02:35	Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)