Skip site navigation (1) Skip section navigation (2)

Unicode Normalization

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Unicode Normalization
Date: 2009-09-23 18:08:14
Message-ID: DAD699B0-D72F-42EE-A350-F6E78E82F192@kineticode.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hackers,

I just had a discussion on IRC about unicode normalization in  
PostgreSQL. Apparently there is not support for it, currently. Andrew  
Gierth points out that it's part of the SQL spec to support it, though:

> RhodiumToad:e.g.  NORMALIZE(foo,NFC,len)
> justatheory:Oh, just a function then, really.
> RhodiumToad:where the normal form can be any of NFC, NFD, NFKC, NFKD
> RhodiumToad:except that the normal form is an identifier, not a string
> RhodiumToad:also the normal form and length are optional
> RhodiumToad:so NORMALIZE(foo)  is equivalent to NORMALIZE(foo,NFC)

I looked around and found the Public Software Group's utf8proc  
project, which even includes some PostgreSQL support (not, alas, for  
normalization). It has an MIT-licensed C library that offers these  
functions:

> uint8_t utf8proc_NFD(uint8_t str)
>
> Returns a pointer to newly allocated memory of a NFD normalized  
> version of the null-terminated stringstr.
>
> uint8_t utf8proc_NFC(uint8_t str)
>
> Returns a pointer to newly allocated memory of a NFC normalized  
> version of the null-terminated stringstr.
>
> uint8_t utf8proc_NFKD(uint8_t str)
>
> Returns a pointer to newly allocated memory of a NFKD normalized  
> version of the null-terminated stringstr.
>
> uint8_t utf8proc_NFKC(uint8_t str)
>
> Returns a pointer to newly allocated memory of a NFKC normalized  
> version of the null-terminated stringstr.

Anyone got any interest in porting these functions to PostgreSQL? I  
guess the parser would need to be updated to support the use of  
identifiers in the NORMALIZE() function, but otherwise it should be a  
fairly straight-forward port for an experienced C coder, no?

Best,

David

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2009-09-23 18:10:39
Subject: Re: TODO item: Allow more complex user/database default GUC settings
Previous:From: Josh BerkusDate: 2009-09-23 17:58:48
Subject: Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group