Skip site navigation (1) Skip section navigation (2)

Re: ENCODING (Unicode)

From: Jean-Michel POURE <jm(dot)poure(at)freesurf(dot)fr>
To: Reshat Sabiq <sabiq(at)purdue(dot)edu>
Cc: pgadmin-support(at)postgresql(dot)org
Subject: Re: ENCODING (Unicode)
Date: 2003-05-21 07:50:52
Message-ID: 200305210950.52815.jm.poure@freesurf.fr (view raw or flat)
Thread:
Lists: pgadmin-supportpgsql-novice
Le Mercredi 21 Mai 2003 09:10, Reshat Sabiq a écrit :
> Given that i can insert and retrieve Unicode values into either ASCII-based
> or Unicode-based DB, is Unicode-based DB less efficient? I remember reading
> something about it a while ago. I don't see immediately why that would be
> the case though, because special characters are 2 bytes either way,
> assuming we are not simplifying Unicode characters into ASCII.

Dear Reshat,

In unicode (UTF-8), characters are coded on 1 byte (US-English letters), 2 
bytes (Western and Eastern Europe languages) and 3 bytes (all other languages 
including Asian and Indian languages). Technically, you can store UTF-8 
values in an ASCII-based database.

But, storing UTF-8 in an ASCII database is not recommanded, for several 
reasons :

- the query parser might not work well with text values (because it will not 
know wether 1 UTF-8 letter is made of 1, 2 or 3 bytes).

- server-side languages are multi-byte safe. If you calculate the lenght of an 
UTF-8 string in PLpgSQL stored in an ASCII database, it will probably fail 
for special characters.

So, the answer is :

1) If you need to search and display multi-langual text, you need an UTF-8 
database. You will be able to combine all languages in a single database : 
arabic, polish, japanese, etc...

But, be aware that you will also need a full UTF-8 chain behind the database. 
Not all web servers are UTF-8 compliant... Your web pages will also need to 
be saved into UTF-8. Take PHP for example, you will need to enable the 
mb_string option at compilation.

The recommanded way is to design your pages under GNU/Linux as it supports 
UTF-8 encoding very well.

2) If you need to search and display English or Western languages only, an 
ASCII-based database is enough.

Stay tuned. The team will soon test pgAdmin3 UTF-8 compliance. As far as I can 
tell, I could browse UTF-8 data in pgAdmin3.

Cheers,
Jean-Michel

In response to

Responses

pgsql-novice by date

Next:From: HKDate: 2003-05-21 07:53:26
Subject: postmaster didnt start after power failure.
Previous:From: Louis FoucartDate: 2003-05-21 07:27:30
Subject: parse error when executing a simple plpgsql function

pgadmin-support by date

Next:From: frank_lupoDate: 2003-05-21 08:48:29
Subject: Re: Pga2: 4 errors
Previous:From: Dave PageDate: 2003-05-21 07:18:45
Subject: Re: ENCODING (Unicode)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group