Quick Links

Unicode vs SQL_ASCII DBs

From:	"John Sidney-Woollett" <johnsw(at)wardbrook(dot)com>
To:	pgsql-general(at)postgresql(dot)org
Subject:	Unicode vs SQL_ASCII DBs
Date:	2004-01-31 15:32:41
Message-ID:	3379.192.168.0.64.1075563161.squirrel@mercury.wardbrook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

I need to store accented characters in a postgres (7.4) database, and
access the data (mostly) using the postgres JDBC driver (from a web app).

Does anyone know if:

1) Is there a performance loss using (multibyte) UNICODE vs (single byte)
SQL_ASCII/LATINxxx character encoding? (In terms of extra data, and
searching/sorting speeds).

2) Can SQL_ASCII be used for accented characters.

3) If I want accented characters to sort correctly, must I select UNICODE
(or the appropriate ISO 8859 char set) over SQL_ASCII?

4) I'm not initially expecting arabic, chinese, cyrillic or other language
types to be stored in the database. But if they were, would UNICODE be the
best encoding scheme to use for future proofing the data?

5) If I decide not to support/store non-latin languages, is it better to
use a LATIN encoding over UNICODE?

6) What is MULE_INTERNAL? Does it offer performance advantages?

[This next question probably belongs to the JDBC list, but I'll ask anyway]

7) Because the database is being used to backend a java web application,
are there other issues that I need to be aware of, for example, do I have
to convert all data received to UTF-8 before writing it into the database?
And do I have to ensure that the response (from the webserver)
content-type is always set to UTF-8 to be rendered correctly in a user's
browser?

Thanks for any help/advice.

John Sidney-Woollett

ps I did some tests between two databases; once created using UNICODE
encoding, and the other using SQL_ASCII encoding. The database initdb
command specified no encoding, so I guess that SQL_ASCII is the default
encoding. The results are below:

I created the following table in two databases:

Here are the databases; one with UNICODE encoding, and the other with
SQL_ASCII encoding.

I inserted the data into both databases in psql (first setting the client
encoding to LATIN1). Here is the data (the funny character is the word
"test" with the e replaced by an e acute (233).

set client_encoding to 'LATIN1';

insert into table1 values (1, 'tést');
insert into table1 values (2, 'tast');
insert into table1 values (3, 'tost');
insert into table1 values (4, 'test');

Now select the data ordered

NOTE: Bad ordering using SQL_ASCII...

Now I did some other tests, I set the client encoding to UNICODE, and
retrieved the data from both databases:

set client_encoding to 'UNICODE';

NOTE: You can see that the e-acute has been "lost"...

Responses

Re: Unicode vs SQL_ASCII DBs at 2004-01-31 19:44:42 from Kris Jurka
Re: Unicode vs SQL_ASCII DBs at 2004-02-01 00:51:03 from Culley Harrelson

Browse pgsql-general by date

	From	Date	Subject
Next Message	Louis LeBlanc	2004-01-31 16:03:21	Large object insert/update and oid use
Previous Message	Michael Glaesemann	2004-01-31 10:49:15	Re: Two joins on same foreign key