On Thu, 4 Jun 1998 t-ishii(at)sra(dot)co(dot)jp wrote:
> >Hi. I'm looking for non-English-using Postgres hackers to participate in
> >implementing NCHAR() and alternate character sets in Postgres. I think
> >I've worked out how to do the implementation (not the details, just a
> >strategy) so that multiple character sets will be allowed in a single
> >database, additional character sets can be loaded at run-time, and so
> >that everything will behave transparently.
> Sounds interesting idea... But before going into discussion, Let me
> make clarify what "character sets" means. A character sets consists of
> some characters. One of the most famous character set is ISO646
> (almost same as ASCII). In western Europe, ISO 8859 series character
> sets are widely used. For example, ISO 8859-1 includes English,
> French, German etc. and ISO 8859-2 includes Albanian, Romanian
> etc. These are "single byte" and there is one to many correspondacne
> between the character set and Languages.
> ISO 8859-1 <------> English, French, German
> On the other hand, some asian languages such as Japanese, Chinese, and
> Korean do not correspond to a chacter set, rather correspond to
> multiple character sets.
> ASCII, JIS X0208, JIS X0201, JIS X0212 <-------> Japanese
> (ASCII, JIS X0208, JIS X0201, JIS X0212 are individual character sets)
> An "encoding" is a way to represent set of charactser sets in
> computers. The above set of characters sets are encoded in the EUC_JP
> I think SQL92 uses a term "character set" as encoding.
> >So, the initial questions:
> >1) Is the NCHAR/NVARCHAR/CHARACTER SET syntax and usage acceptable for
> >non-English applications? Do other databases use this SQL92 convention,
> >or does it have difficulties?
> As far as I know, there is no commercial RDBMS that supports
> NCHAR/NVARCHAR/CHARACTER SET syntax. Oracle supports multiple
> encodings. An encoding for a database is defined while creating the
> database and cannot be changed at runtime. Clients can use different
> encoding as long as it is a "subset" of the database's encoding. For
> example, a oracle client can use ASCII if the database encoding is
I try the following databases on Linux and no one has this feature:
. ADABAS D
I found only one under M$-Windows that implement this feature:
I'm playing with it, but so far I don't understand its behavior.
There's an interesting documentation about it on OCELOT manual,
if you want I can send it to you.
> I think the idea that the "default" encoding for a database being
> defined at the database creation time is nice.
> create database with encoding EUC_JP;
> If NCHAR/NVARCHAR/CHARACTER SET syntax would be supported, a user
> could use a different encoding other than EUC_JP. Sound very nice too.
> >2) Would anyone be interested in helping to define the character sets
> >and helping to test? I don't know the correct collation sequences and
> >don't think they would display properly on my screen...
> I would be able to help you in the Japanese part. For Chinese and
> Korean, I'm going to find volunteers in the local PostgreSQL mailing
> list I'm running if necessary.
I may help with Italian, Spanish and Portuguese.
> >3) I'd like to implement the existing Cyrillic and EUC-jp character
> >sets, and also some European languages (French and ??) which use the
> >Latin-1 alphabet but might have different collation sequences. Any
> >suggestions for candidates??
> Collation sequences for EUC_JP? How nice it would be! One of a problem
> for collation sequences for multi-byte encodings is the sequence might
> become huge. Seems you have a solution for that. Please let me know
> more details.
> Tatsuo Ishii
In response to
pgsql-hackers by date
|Next:||From: Mattias Kregert||Date: 1998-06-04 10:48:08|
|Subject: Re: [HACKERS] NEW POSTGRESQL LOGOS|
|Previous:||From: Peter Mount||Date: 1998-06-04 08:58:12|
|Subject: Re: [HACKERS] Re: [PATCHES] Postgres-6.3.2 locale patch (fwd) |