Re: [HACKERS] Unicode combining characters

From: Patrice Hédé <phede-ml(at)islande(dot)org>
To: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Unicode combining characters
Date: 2001-10-08 19:35:44
Message-ID: 20011008213543.E14587@idf.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi,

I should have sent the patch earlier, but got delayed by other stuff.
Anyway, here is the patch:

- most of the functionality is only activated when MULTIBYTE is
defined,

- check valid UTF-8 characters, client-side only yet, and only on
output, you still can send invalid UTF-8 to the server (so, it's
only partly compliant to Unicode 3.1, but that's better than
nothing).

- formats with the correct number of columns (that's why I made it in
the first place after all), but only for UNICODE. However, the code
allows to plug-in routines for other encodings, as Tatsuo did for
the other multibyte functions.

- corrects a bit the UTF-8 code from Tatsuo to allow Unicode 3.1
characters (characters with values >= 0x10000, which are encoded on
four bytes).

- doesn't depend on the locale capabilities of the glibc (useful for
remote telnet).

I would like somebody to check it closely, as it is my first patch to
pgsql. Also, I created dummy .orig files, so that the two files I
created are included, I hope that's the right way.

Now, a lot of functionality is NOT included here, but I will keep that
for 7.3 :) That includes all string checking on the server side (which
will have to be a bit more optimised ;) ), and the input checking on
the client side for UTF-8, though that should not be difficult. It's
just to send the strings through mbvalidate() before sending them to
the server. Strong checking on UTF-8 strings is mandatory to be
compliant with Unicode 3.1+ .

Do I have time to look for a patch to include iso-8859-15 for 7.2 ?
The euro is coming 1. january 2002 (before 7.3 !) and over 280
millions people in Europe will need the euro sign and only iso-8859-15
and iso-8859-16 have it (and unfortunately, I don't think all Unices
will switch to Unicode in the meantime)....

err... yes, I know that this is not every single person in Europe that
uses PostgreSql, so it's not exactly 280m, but it's just a matter of
time ! ;)

I'll come back (on pgsql-hackers) later to ask a few questions
regarding the full unicode support (normalisation, collation,
regexes,...) on the server side :)

Here is the patch !

Patrice.

--
Patrice HÉDÉ ------------------------------- patrice à islande org -----
-- Isn't it weird how scientists can imagine all the matter of the
universe exploding out of a dot smaller than the head of a pin, but they
can't come up with a more evocative name for it than "The Big Bang" ?
-- What would _you_ call the creation of the universe ?
-- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes
------------------------------------------ http://www.islande.org/ -----

Attachment Content-Type Size
wcwidth.patch text/plain 22.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2001-10-08 21:02:06 Beta Wednesday
Previous Message Ryan 2001-10-08 19:28:20 Postgres server locks up

Browse pgsql-patches by date

  From Date Subject
Next Message Peter Eisentraut 2001-10-08 23:25:00 Re: Place of PO files for NLS (was Re: [PATCHES] PG_DUMP NLS (Russian))
Previous Message steve 2001-10-08 16:34:51 pg_dump oid problems