Skip site navigation (1) Skip section navigation (2)

Re: ENCODING (Unicode)

From: Reshat Sabiq <sabiq(at)purdue(dot)edu>
To: jm(dot)poure(at)freesurf(dot)fr
Cc: pgadmin-support(at)postgresql(dot)org
Subject: Re: ENCODING (Unicode)
Date: 2003-05-21 17:19:05
Message-ID: 3ECBB509.3060800@purdue.edu (view raw or flat)
Thread:
Lists: pgadmin-supportpgsql-novice
Jean-Michel POURE wrote:
> In unicode (UTF-8), characters are coded on 1 byte (US-English letters), 2 
> bytes (Western and Eastern Europe languages) and 3 bytes (all other languages 
> including Asian and Indian languages). Technically, you can store UTF-8 
> values in an ASCII-based database.
> 
> But, storing UTF-8 in an ASCII database is not recommanded, for several 
> reasons :
> 
> - the query parser might not work well with text values (because it will not 
> know wether 1 UTF-8 letter is made of 1, 2 or 3 bytes).
> 
> - server-side languages are multi-byte safe. If you calculate the lenght of an 
> UTF-8 string in PLpgSQL stored in an ASCII database, it will probably fail 
> for special characters.

Thanks for your feedback Jean-Michel,

You made a good point, I forgot about the queries. I guess each 
character is converted into 4 bytes while parsing, so it makes a lot of 
difference between 1 2-byte character (4 bytes), and 2 1-byte characters 
(8 bytes).

However, i haven't heard of UTF-8 supporting 3-byte values. From what i 
know, special characters are 2 bytes in UTF-8. 2-byte Unicode set is 
enough to cover all characters, including Asian (with Chinese taking a 
couple dozen thousands of characters). I read something recently about 
3-byte character support in one of the standards (UTF-16?), but the RFC 
said there are no 3-byte assignments yet, because 2-byte range is 
currently enough...

But you are right, i should use UNICODE encoding when i use characters 
beyond extended ASCII.
As far as applications, i usually use Java, which supports Unicode. I'm 
glad that PHP does so as well. And i sure look forward to pgAdmin3.

Good luck,
Reshat.


In response to

Responses

pgsql-novice by date

Next:From: Tom LaneDate: 2003-05-21 17:33:22
Subject: Re: Unaccounted for disk use
Previous:From: Nicholas AllenDate: 2003-05-21 16:57:53
Subject: How to send an email when data is inserted into a table

pgadmin-support by date

Next:From: Jean-Michel POUREDate: 2003-05-21 17:48:56
Subject: Re: ENCODING (Unicode)
Previous:From: Dave PageDate: 2003-05-21 10:47:47
Subject: pgAdmin Licencing Changes

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group