Quick Links

Implementing full UTF-8 support (aka supporting 0x00)

From:	Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
To:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Implementing full UTF-8 support (aka supporting 0x00)
Date:	2016-08-03 14:54:52
Message-ID:	5aa1df8a-96f5-1d14-46fd-032e32846c71@8kdata.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi list.

As has been previously discussed (see
https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl
for instance) varlena fields cannot accept the literal 0x00 value. Sure,
you can use bytea, but this hardly a good solution. The problem seems to
be hitting some use cases, like:

- People migrating data from other databases (apart from PostgreSQL, I
don't know of any other database which suffers the same problem).
- People using drivers which use UTF-8 or equivalent encodings by
default (Java for example)

Given that 0x00 is a perfectly legal UTF-8 character, I conclude
we're strictly non-compliant. And given the general Postgres policy
regarding standards compliance and the people being hit by this, I think
it should be addressed. Specially since all the usual fixes are a real
PITA (re-parsing, re-generating strings, which is very expensive, or
dropping data).

What would it take to support it? Isn't the varlena header
propagated everywhere, which could help infer the real length of the
string? Any pointers or suggestions would be welcome.

Thanks,

Álvaro

Álvaro Hernández Tortosa

-----------
8Kdata

Responses

Re: Implementing full UTF-8 support (aka supporting 0x00) at 2016-08-03 15:23:41 from Tom Lane
Re: Implementing full UTF-8 support (aka supporting 0x00) at 2016-08-03 15:47:56 from Kevin Grittner
Re: Implementing full UTF-8 support (aka supporting 0x00) at 2016-08-03 16:35:36 from Geoff Winkless
Re: Implementing full UTF-8 support (aka supporting 0x00) at 2016-08-03 17:16:40 from Craig Ringer

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2016-08-03 14:57:54	Re: Wanting to learn about pgsql design decision
Previous Message	Tom Lane	2016-08-03 14:53:10	Re: regression test for extended query protocol