Quick Links

Re: Almost bug in COPY FROM processing of GB18030 encoded input

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Almost bug in COPY FROM processing of GB18030 encoded input
Date:	2019-01-24 21:27:11
Message-ID:	CA+Tgmob3Kq0beYh8Myh5=7SEt4+MKhRig6RumaV3=bYqYQqwFw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Jan 23, 2019 at 6:23 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> I happened to notice that when CopyReadLineText() calls mblen(), it
> passes only the first byte of the multi-byte characters. However,
> pg_gb18030_mblen() looks at the first and the second byte.
> CopyReadLineText() always passes \0 as the second byte, so
> pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded
> characters as 2.
>
> It works out fine, though, because the second half of the 4-byte encoded
> character always looks like another 2-byte encoded character, in
> GB18030. CopyReadLineText() is looking for delimiter and escape
> characters and newlines, and only single-byte characters are supported
> for those, so treating a 4-byte character as two 2-byte characters is
> harmless.

Yikes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Almost bug in COPY FROM processing of GB18030 encoded input at 2019-01-23 11:23:23 from Heikki Linnakangas

Responses

Re: Almost bug in COPY FROM processing of GB18030 encoded input at 2019-01-25 12:56:27 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	John Naylor	2019-01-24 21:34:04	Re: Delay locking partitions during INSERT and UPDATE
Previous Message	Peter Eisentraut	2019-01-24 21:26:22	Re: proposal - plpgsql unique statement id