Quick Links

Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table

From:	Reece Pegues <RPegues(at)tripwire(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table
Date:	2016-03-21 17:10:45
Message-ID:	D315A115.2CF4F%rpegues@tripwire.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Looks like the database is created with ENCODING = 'SQL_ASCII'

So I assume it was thus saving the data that way, and then if the client
encoding is utf8 it tried to encode to that and failed?

-Reece

On 3/21/16, 12:46 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>rpegues(at)tripwire(dot)com writes:
>> We have a table with an update trigger where if you modify a certain
>>column,
>> we change the name of the row by calling a function.
>> In the function, substring() the name and then add a random string to
>>that.
>> However, the substring appears to cut a unicode character in half, and
>>the
>> update trigger then updates the name with the broken string.
>
>That should not happen if Postgres knows it's dealing with unicode data.
>What have you got the database's encoding set to?
>
> regards, tom lane

In response to

Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table at 2016-03-21 16:46:30 from Tom Lane

Responses

Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table at 2016-03-21 21:04:36 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Marc-Olaf Jaschke	2016-03-21 20:40:37	Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Previous Message	Tom Lane	2016-03-21 16:46:30	Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table