Re: Implementing full UTF-8 support (aka supporting 0x00)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Álvaro Hernández Tortosa <aht(at)8kdata(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Implementing full UTF-8 support (aka supporting 0x00)
Date: 2016-08-03 15:23:41
Message-ID: 24883.1470237821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?Q?=c3=81lvaro_Hern=c3=a1ndez_Tortosa?= <aht(at)8kdata(dot)com> writes:
> As has been previously discussed (see
> https://www.postgresql.org/message-id/BAY7-F17FFE0E324AB3B642C547E96890%40phx.gbl
> for instance) varlena fields cannot accept the literal 0x00 value.

Yup.

> What would it take to support it?

One key reason why that's hard is that datatype input and output
functions use nul-terminated C strings as the representation of the
text form of any datatype. We can't readily change that API without
breaking huge amounts of code, much of it not under the core project's
control.

There may be other places where nul-terminated strings would be a hazard
(mumble fgets mumble), but offhand that API seems like the major problem
so far as the backend is concerned.

There would be a slew of client-side problems as well. For example this
would assuredly break psql and pg_dump, along with every other client that
supposes that it can treat PQgetvalue() as returning a nul-terminated
string. This end of it would possibly be even worse than fixing the
backend, because so little of the affected code is under our control.

In short, the problem is not with having an embedded nul in a stored
text value. The problem is the reams of code that suppose that the
text representation of any data value is a nul-terminated C string.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2016-08-03 15:27:28 Re: Wanting to learn about pgsql design decision
Previous Message Dave Cramer 2016-08-03 15:14:33 Re: regression test for extended query protocol