Re: Pre-proposal: unicode normalized text

From: Nico Williams <nico(at)cryptonector(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-proposal: unicode normalized text
Date: 2023-10-05 19:14:54
Message-ID: ZR8LLrk9AJVxEFbX@ubby21
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 05, 2023 at 07:31:54AM -0400, Robert Haas wrote:
> [...] On the other hand, to do that in PostgreSQL, we'd need to
> propagate the character set/encoding information into all of the
> places that currently get the typmod and collation, and that is not a
> small number of places. It's a lot of infrastructure for the project
> to carry around for a feature that's probably only going to continue
> to become less relevant.

Text+encoding can be just like bytea with a one- or two-byte prefix
indicating what codeset+encoding it's in. That'd be how to encode
such text values on the wire, though on disk the column's type should
indicate the codeset+encoding, so no need to add a prefix to the value.

Complexity would creep in around when and whether to perform automatic
conversions. The easy answer would be "never, on the server side", but
on the client side it might be useful to convert to/from the locale's
codeset+encoding when displaying to the user or accepting user input.

If there's no automatic server-side codeset/encoding conversions then
the server-side cost of supporting non-UTF-8 text should not be too high
dev-wise -- it's just (famous last words) a generic text type
parameterized by codeset+ encoding type. There would not even be a hard
need for functions for conversions, though there would be demand for
them.

But I agree that if there's no need, there's no need. UTF-8 is great,
and if only all PG users would just switch then there's not much more to
do.

Nico
--

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2023-10-05 19:16:34 Re: Pre-proposal: unicode normalized text
Previous Message Nathan Bossart 2023-10-05 19:04:53 Re: [PoC/RFC] Multiple passwords, interval expirations