Skip site navigation (1) Skip section navigation (2)

Re: [PATCHES] Postgres-6.3.2 locale patch

From: "Jose' Soares Da Silva" <sferac(at)bo(dot)nettuno(dot)it>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu>, phd2(at)earthling(dot)net, Postgres Hackers List <hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] Postgres-6.3.2 locale patch
Date: 1998-06-04 10:13:31
Message-ID: Pine.LNX.3.96.980604100622.945C-100000@proxy.bazzanese.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Thu, 4 Jun 1998 t-ishii(at)sra(dot)co(dot)jp wrote:

> >Hi. I'm looking for non-English-using Postgres hackers to participate in
> >implementing NCHAR() and alternate character sets in Postgres. I think
> >I've worked out how to do the implementation (not the details, just a
> >strategy) so that multiple character sets will be allowed in a single
> >database, additional character sets can be loaded at run-time, and so
> >that everything will behave transparently.
> 
> Sounds interesting idea... But before going into discussion, Let me
> make clarify what "character sets" means. A character sets consists of
> some characters. One of the most famous character set is ISO646
> (almost same as ASCII). In western Europe, ISO 8859 series character
> sets are widely used. For example, ISO 8859-1 includes English,
> French, German etc. and ISO 8859-2 includes Albanian, Romanian
> etc. These are "single byte" and there is one to many correspondacne
> between the character set and Languages.
> 
> Example1:
> ISO 8859-1 <------> English, French, German
> 
> On the other hand, some asian languages such as Japanese, Chinese, and
> Korean do not correspond to a chacter set, rather correspond to
> multiple character sets.
> 
> Example2:
> ASCII, JIS X0208, JIS X0201, JIS X0212 <-------> Japanese
> (ASCII, JIS X0208, JIS X0201, JIS X0212 are individual character sets)
> 
> An "encoding" is a way to represent set of charactser sets in
> computers. The above set of characters sets are encoded in the EUC_JP
> encdoing.
> 
> I think SQL92 uses a term "character set" as encoding.
> 
> >So, the initial questions:
> >
> >1) Is the NCHAR/NVARCHAR/CHARACTER SET syntax and usage acceptable for
> >non-English applications? Do other databases use this SQL92 convention,
> >or does it have difficulties?
> 
> As far as I know, there is no commercial RDBMS that supports
> NCHAR/NVARCHAR/CHARACTER SET syntax. Oracle supports multiple
> encodings. An encoding for a database is defined while creating the
> database and cannot be changed at runtime. Clients can use different
> encoding as long as it is a "subset" of the database's encoding. For
> example, a oracle client can use ASCII if the database encoding is
> EUC_JP.

I try the following databases on Linux and no  one has this feature:
. MySql
. Solid 
. Empress   
. Kubl
. ADABAS D

I found only one under M$-Windows that implement this feature:
. OCELOT
I'm playing with it, but so far I don't understand its behavior.
There's an interesting documentation about it on OCELOT manual,
if you want I can send it to you.

> 
> I think the idea that the "default" encoding for a database being
> defined at the database creation time is nice.
> 
> create database with encoding EUC_JP;
> 
> If NCHAR/NVARCHAR/CHARACTER SET syntax would be supported, a user
> could use a different encoding other than EUC_JP. Sound very nice too.
> 
> >2) Would anyone be interested in helping to define the character sets
> >and helping to test? I don't know the correct collation sequences and
> >don't think they would display properly on my screen...
> 
> I would be able to help you in the Japanese part. For Chinese and
> Korean, I'm going to find volunteers in the local PostgreSQL mailing
> list I'm running if necessary.

I may help with Italian, Spanish and Portuguese.

> 
> >3) I'd like to implement the existing Cyrillic and EUC-jp character
> >sets, and also some European languages (French and ??) which use the
> >Latin-1 alphabet but might have different collation sequences. Any
> >suggestions for candidates??
> 
> Collation sequences for EUC_JP? How nice it would be! One of a problem
> for collation sequences for multi-byte encodings is the sequence might
> become huge. Seems you have a solution for that. Please let me know
> more details.
> --
> Tatsuo Ishii
> t-ishii(at)sra(dot)co(dot)jp
                                                            Ciao, Jose'


In response to

Responses

pgsql-hackers by date

Next:From: Mattias KregertDate: 1998-06-04 10:48:08
Subject: Re: [HACKERS] NEW POSTGRESQL LOGOS
Previous:From: Peter MountDate: 1998-06-04 08:58:12
Subject: Re: [HACKERS] Re: [PATCHES] Postgres-6.3.2 locale patch (fwd)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group