Re: [PATCHES] Postgres-6.3.2 locale patch

From: "Jose' Soares Da Silva" <sferac(at)bo(dot)nettuno(dot)it>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu>, phd2(at)earthling(dot)net, Postgres Hackers List <hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Postgres-6.3.2 locale patch
Date: 1998-06-04 10:13:31
Message-ID: Pine.LNX.3.96.980604100622.945C-100000@proxy.bazzanese.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 4 Jun 1998 t-ishii(at)sra(dot)co(dot)jp wrote:

> >Hi. I'm looking for non-English-using Postgres hackers to participate in
> >implementing NCHAR() and alternate character sets in Postgres. I think
> >I've worked out how to do the implementation (not the details, just a
> >strategy) so that multiple character sets will be allowed in a single
> >database, additional character sets can be loaded at run-time, and so
> >that everything will behave transparently.
>
> Sounds interesting idea... But before going into discussion, Let me
> make clarify what "character sets" means. A character sets consists of
> some characters. One of the most famous character set is ISO646
> (almost same as ASCII). In western Europe, ISO 8859 series character
> sets are widely used. For example, ISO 8859-1 includes English,
> French, German etc. and ISO 8859-2 includes Albanian, Romanian
> etc. These are "single byte" and there is one to many correspondacne
> between the character set and Languages.
>
> Example1:
> ISO 8859-1 <------> English, French, German
>
> On the other hand, some asian languages such as Japanese, Chinese, and
> Korean do not correspond to a chacter set, rather correspond to
> multiple character sets.
>
> Example2:
> ASCII, JIS X0208, JIS X0201, JIS X0212 <-------> Japanese
> (ASCII, JIS X0208, JIS X0201, JIS X0212 are individual character sets)
>
> An "encoding" is a way to represent set of charactser sets in
> computers. The above set of characters sets are encoded in the EUC_JP
> encdoing.
>
> I think SQL92 uses a term "character set" as encoding.
>
> >So, the initial questions:
> >
> >1) Is the NCHAR/NVARCHAR/CHARACTER SET syntax and usage acceptable for
> >non-English applications? Do other databases use this SQL92 convention,
> >or does it have difficulties?
>
> As far as I know, there is no commercial RDBMS that supports
> NCHAR/NVARCHAR/CHARACTER SET syntax. Oracle supports multiple
> encodings. An encoding for a database is defined while creating the
> database and cannot be changed at runtime. Clients can use different
> encoding as long as it is a "subset" of the database's encoding. For
> example, a oracle client can use ASCII if the database encoding is
> EUC_JP.

I try the following databases on Linux and no one has this feature:
. MySql
. Solid
. Empress
. Kubl
. ADABAS D

I found only one under M$-Windows that implement this feature:
. OCELOT
I'm playing with it, but so far I don't understand its behavior.
There's an interesting documentation about it on OCELOT manual,
if you want I can send it to you.

>
> I think the idea that the "default" encoding for a database being
> defined at the database creation time is nice.
>
> create database with encoding EUC_JP;
>
> If NCHAR/NVARCHAR/CHARACTER SET syntax would be supported, a user
> could use a different encoding other than EUC_JP. Sound very nice too.
>
> >2) Would anyone be interested in helping to define the character sets
> >and helping to test? I don't know the correct collation sequences and
> >don't think they would display properly on my screen...
>
> I would be able to help you in the Japanese part. For Chinese and
> Korean, I'm going to find volunteers in the local PostgreSQL mailing
> list I'm running if necessary.

I may help with Italian, Spanish and Portuguese.

>
> >3) I'd like to implement the existing Cyrillic and EUC-jp character
> >sets, and also some European languages (French and ??) which use the
> >Latin-1 alphabet but might have different collation sequences. Any
> >suggestions for candidates??
>
> Collation sequences for EUC_JP? How nice it would be! One of a problem
> for collation sequences for multi-byte encodings is the sequence might
> become huge. Seems you have a solution for that. Please let me know
> more details.
> --
> Tatsuo Ishii
> t-ishii(at)sra(dot)co(dot)jp
Ciao, Jose'

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mattias Kregert 1998-06-04 10:48:08 Re: [HACKERS] NEW POSTGRESQL LOGOS
Previous Message Peter Mount 1998-06-04 08:58:12 Re: [HACKERS] Re: [PATCHES] Postgres-6.3.2 locale patch (fwd)