Quick Links

Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Reece Pegues <RPegues(at)tripwire(dot)com>
Cc:	"pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject:	Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table
Date:	2016-03-21 21:04:36
Message-ID:	1163.1458594276@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Reece Pegues <RPegues(at)tripwire(dot)com> writes:
> Looks like the database is created with ENCODING = 'SQL_ASCII'

Basically what that does is defeats all encoding checks inside the
backend; it'll store whatever bytes you give it. So yeah, substring()
is expected to deal in bytes not characters in this encoding.

> So I assume it was thus saving the data that way, and then if the client
> encoding is utf8 it tried to encode to that and failed?

If client declares its encoding, the backend will verify correct encoding
before transmitting data; but if the database encoding is SQL_ASCII then
no actual conversion happens, only a validity check at transmit/receive.

regards, tom lane

In response to

Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table at 2016-03-21 17:10:45 from Reece Pegues

Responses

Re: BUG #14038: substring cuts unicode char in half, allowing to save broken utf8 into table at 2016-03-22 00:47:23 from Reece Pegues

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Daniel Golle	2016-03-21 21:38:28	Re: BUG #14033: cross-compilation to ARM fails
Previous Message	Marc-Olaf Jaschke	2016-03-21 20:40:37	Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)