Re: Unicode combining characters

From: Patrice Hédé <phede-ml(at)islande(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unicode combining characters
Date: 2001-09-25 18:14:20
Message-ID: 20010925201420.O1316@idf.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

* Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp> [010925 18:18]:
> > So, this shows two problems :
> >
> > - length() on the server side doesn't handle correctly Unicode [I
> > have the same result with char_length()], and returns the number
> > of chars (as it is however advertised to do), rather the length
> > of the string.
>
> This is a known limitation.

To solve this, we could use wcwidth() (there is a custom
implementation for the systems which don't have it in the glibc). I'll
have a look at it later.

> > - the psql frontend makes the same mistake.

Same thing here.

I have just installed the CVS and downloaded the development version
(thanks Baldvin), tested that the stock version compiles fine, and
I'll now have a look at how to make this work. :) I'll send a patch
when I have this working here.

> Sounds great.

[Unicode normalisation and collation in the backend]

> I'm very interested in your point. I will start studying [1][2] after
> the beta freeze.
>
> > Anyway, I'm open to suggestions :
> >
> > - implement it in C, in the core,
> >
> > - implement it in C, as contributed custom functions,
>
> This may be a good starting point.
>
> > I can't really accept a solution which would rely on the underlaying
> > libc, as it may not provide the necessary locales (or maybe, then,
>
> I totally agree here.

As Oleg suggested, I will try to aim for 7.3, first with a version in
contrib, and later, if the implementation is fine, it could be moved
to the core (or not ? Though it would be nice to make sure every
PostgreSQL installation which supports unicode has it, so that users
won't need to have administrative rights to use the functionality).

I think I will go for a C version, and probably the collation and
normalisation data in tables, with some way to override the defaults
with secondary tables... I'll report as soon as I have something +/-
working.

> --
> Tatsuo Ishii

Patrice.

--
Patrice HÉDÉ ------------------------------- patrice à islande org -----
-- Isn't it weird how scientists can imagine all the matter of the
universe exploding out of a dot smaller than the head of a pin, but they
can't come up with a more evocative name for it than "The Big Bang" ?
-- What would _you_ call the creation of the universe ?
-- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes
------------------------------------------ http://www.islande.org/ -----

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2001-09-25 18:15:06 Re: Problem with setlocale (found in libecpg) [accessing a memory location after freeing it]
Previous Message Haller Christoph 2001-09-25 16:27:38 Transaction in chained mode