Re: [PATCHES] Unicode combining characters

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: phede-ml(at)islande(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] Unicode combining characters
Date: 2001-10-15 01:26:19
Message-ID: 20011015102619W.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I have committed part of Patrice's patches with minor fixes.
Uncommitted changes are related to the backend side, and the reason
could be found in the previous discussions (basically this is due to
the fact that current regex code does not support UTF-8 chars >=
0x10000). Instead pg_veryfymbstr() now rejects UTF-8 chars >= 0x10000.
--
Tatsuo Ishii

> Hi,
>
> I should have sent the patch earlier, but got delayed by other stuff.
> Anyway, here is the patch:
>
> - most of the functionality is only activated when MULTIBYTE is
> defined,
>
> - check valid UTF-8 characters, client-side only yet, and only on
> output, you still can send invalid UTF-8 to the server (so, it's
> only partly compliant to Unicode 3.1, but that's better than
> nothing).
>
> - formats with the correct number of columns (that's why I made it in
> the first place after all), but only for UNICODE. However, the code
> allows to plug-in routines for other encodings, as Tatsuo did for
> the other multibyte functions.
>
> - corrects a bit the UTF-8 code from Tatsuo to allow Unicode 3.1
> characters (characters with values >= 0x10000, which are encoded on
> four bytes).
>
> - doesn't depend on the locale capabilities of the glibc (useful for
> remote telnet).
>
> I would like somebody to check it closely, as it is my first patch to
> pgsql. Also, I created dummy .orig files, so that the two files I
> created are included, I hope that's the right way.
>
> Now, a lot of functionality is NOT included here, but I will keep that
> for 7.3 :) That includes all string checking on the server side (which
> will have to be a bit more optimised ;) ), and the input checking on
> the client side for UTF-8, though that should not be difficult. It's
> just to send the strings through mbvalidate() before sending them to
> the server. Strong checking on UTF-8 strings is mandatory to be
> compliant with Unicode 3.1+ .
>
> Do I have time to look for a patch to include iso-8859-15 for 7.2 ?
> The euro is coming 1. january 2002 (before 7.3 !) and over 280
> millions people in Europe will need the euro sign and only iso-8859-15
> and iso-8859-16 have it (and unfortunately, I don't think all Unices
> will switch to Unicode in the meantime)....
>
> err... yes, I know that this is not every single person in Europe that
> uses PostgreSql, so it's not exactly 280m, but it's just a matter of
> time ! ;)
>
> I'll come back (on pgsql-hackers) later to ask a few questions
> regarding the full unicode support (normalisation, collation,
> regexes,...) on the server side :)
>
> Here is the patch !
>
> Patrice.
>
> --
> Patrice HD ------------------------------- patrice islande org -----
> -- Isn't it weird how scientists can imagine all the matter of the
> universe exploding out of a dot smaller than the head of a pin, but they
> can't come up with a more evocative name for it than "The Big Bang" ?
> -- What would _you_ call the creation of the universe ?
> -- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes
> ------------------------------------------ http://www.islande.org/ -----

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Lincoln Yeoh 2001-10-15 02:32:51 Re: Pre-forking backend
Previous Message Tatsuo Ishii 2001-10-15 01:05:20 Re: pg_client_encoding

Browse pgsql-patches by date

  From Date Subject
Next Message Christopher Kings-Lynne 2001-10-15 02:37:21 Re: Showing index details with \d on psql
Previous Message Bruce Momjian 2001-10-15 00:22:01 Re: psql: default base and password reading