Re: [HACKERS] Unicode ready?

From: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
To: "Kevin McPherson" <kevinmcp(at)en-tranz(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: [HACKERS] Unicode ready?
Date: 2002-04-02 20:56:55
Message-ID: 200204022056.g32Kuthk031747@www1.translationforge
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

Le Mardi 2 Avril 2002 12:53, vous avez écrit :
> Is PostgreSQL unicode compliant/ready?
> Does it store/export text in Unicode wide-character format, or single
> character strings?

[By the way : there are several Unicode encodings (UTF-8, UTF-16, UCS2).
UTF-8 is the most popular because wide characters are coded using 1 to 3
single ASCII character. Thus UTF-8 extracts can be read in a normal text
editor. On the converse, UTF-16 is coded on 16 bytes, thus can't be read
easily.]

I guess your question was "Is PostgreSQL multi-byte safe and Unicode ready?"

1) Server-side :
a) PostgreSQL needs to be compiled with
--enable-recode
--enable-multibyte
b) Create a database with
CREATE DATABASE foo WITH ENCODING ='UNICODE' (which means UTF-8 in POstgreSQL)

Several other multi-byte encodings are available. In the case of Unicode,
data is stored in UTF-8 format. Data and searches are performed on
wide-characters, not 8 bits characters.

2) Client side
By default connection is done with server encoding. But it is possible to
automatically recode connections on the fly using :

SET CLIENT_ENCODING = Latin9 (this example recodes Unicode streams to Western
European with Euro symbol). It is possible to recode several streams at the
same time.

3) ODBC interface
The current odbc interface provides Unicode UTF-8 Unicode encoding. But
Microsoft platform needs a Unicode UCS-2 encoding (ex: Access 2K). Therefore,
you will be able to view data under OpenOffice but not Microsoft Office.

The new ODBC driver in CVS supports UCS-2.

4) Server side languages
Server-side languages are the traditional weakness of Unicode programming.
When writing code, you need to calculate the lenght of a string, crop the
left side of it, etc... In PHP, this is dones using special mb_string
libraries. Usually, this breaks your code because these libraries provide
additional programming words.

This is not the case in PostgreSQL where all PLpgSQL functions are multi-byte
safe. Because of PHP instability, I ported several functions to PLpgSQL.

PostgreSQL is a pure marvel.

For additional questions, please post to pgsql-general(at)postgresql(dot)org(dot)

Cheers,
Jean-MIchel POURE

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2002-04-02 21:40:57 Suggestions please: names for function cachability attributes
Previous Message Stephan Szabo 2002-04-02 20:35:08 Re: Inheritance referential integrity problem

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-04-02 21:06:27 Re: maxint reached?
Previous Message Daniel Kalchev 2002-04-02 20:39:33 Re: maxint reached?