Quick Links

Re: Pre-proposal: unicode normalized text

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Nico Williams <nico(at)cryptonector(dot)com>
Cc:	Isaac Morland <isaac(dot)morland(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Chapman Flack <chap(at)anastigmatix(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Pre-proposal: unicode normalized text
Date:	2023-10-06 17:33:06
Message-ID:	CA+TgmoaF_KHkgcisLFaKzR_husGtAOnjEez9biH2QHyX-4dAyA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Oct 5, 2023 at 3:15 PM Nico Williams <nico(at)cryptonector(dot)com> wrote:
> Text+encoding can be just like bytea with a one- or two-byte prefix
> indicating what codeset+encoding it's in. That'd be how to encode
> such text values on the wire, though on disk the column's type should
> indicate the codeset+encoding, so no need to add a prefix to the value.

Well, that would be making the encoding a per-value property, rather
than a per-column property like collation as I proposed. I can't see
that working out very nicely, because encodings are
collation-specific. It wouldn't make any sense if the column collation
were en_US.UTF8 or ko_KR.eucKR or en_CA.ISO8859-1 (just to pick a few
values that are legal on my machine) while data stored in the column
was from a whole bunch of different encodings, at most one of which
could be the one to which the column's collation applied. That would
end up meaning, for example, that such a column was very hard to sort.

For that and other reasons, I suspect that the utility of storing data
from a variety of different encodings in the same database column is
quite limited. What I think people really want is a whole column in
some encoding that isn't the normal one for that database. That's not
to say we should add such a feature, but if we do, I think it should
be that, not a different encoding for every individual value.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Re: Pre-proposal: unicode normalized text at 2023-10-05 19:14:54 from Nico Williams

Responses

Re: Pre-proposal: unicode normalized text at 2023-10-06 17:38:45 from Nico Williams
Re: Pre-proposal: unicode normalized text at 2023-10-06 19:07:17 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Nico Williams	2023-10-06 17:38:45	Re: Pre-proposal: unicode normalized text
Previous Message	Jeff Davis	2023-10-06 17:22:48	Re: Pre-proposal: unicode normalized text