Quick Links

Re: Unicode Normalization

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	pg1(at)thetdh(dot)com
Cc:	"PG Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Unicode Normalization
Date:	2009-09-24 15:36:37
Message-ID:	9BD6C83B-018E-4263-9EC8-33344FEDF655@kineticode.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sep 24, 2009, at 6:24 AM, pg(at)thetdh(dot)com wrote:

> In a context using normalization, wouldn't you typically want to
> store a normalized-text type that could perhaps (depending on
> locale) take advantage of simpler, more-efficient comparison
> functions?

That might be nice, but I'd be wary of a geometric multiplication of
text types. We already have TEXT and CITEXT; what if we had your NTEXT
(normalized text) but I wanted it to also be case-insensitive?

> Whether you're doing INSERT/UPDATE, or importing a flat text file,
> if you canonicalize characters and substrings of identical meaning
> when trivial distinctions of encoding are irrelevant, you're better
> off later. User-invocable normalization functions by themselves
> don't make much sense.

Well, they make sense because there's nothing else right now. It's an
easy way to get some support in, and besides, it's mandated by the SQL
standard.

> (If Postgres now supports binary- or mixed-binary-and-text flat
> files, perhaps for restore purposes, the same thing applies.)

Don't follow this bit.

Best,

David

In response to

Re: Unicode Normalization at 2009-09-24 13:24:07 from pg

Responses

Re: Unicode Normalization at 2009-09-24 15:59:09 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Dunstan	2009-09-24 15:59:09	Re: Unicode Normalization
Previous Message	Marko Tiikkaja	2009-09-24 14:23:17	Re: Using results from INSERT ... RETURNING