Re: Unicode vs SQL_ASCII DBs

From: Kris Jurka <books(at)ejurka(dot)com>
To: John Sidney-Woollett <johnsw(at)wardbrook(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Unicode vs SQL_ASCII DBs
Date: 2004-01-31 19:44:42
Message-ID: Pine.LNX.4.33.0401311420240.15073-100000@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sat, 31 Jan 2004, John Sidney-Woollett wrote:

> Hi
>
> I need to store accented characters in a postgres (7.4) database, and
> access the data (mostly) using the postgres JDBC driver (from a web app).
>
> Does anyone know if:
>
> 2) Can SQL_ASCII be used for accented characters.

Not with the JDBC driver. A client which is blissfully unaware of
encoding issues can pass data into and out of an ascii db without knowing
what the encoding is, but java must know.

>
> 3) If I want accented characters to sort correctly, must I select UNICODE
> (or the appropriate ISO 8859 char set) over SQL_ASCII?

You are confusing encoding with locale. Locales determines the correct
sort order and you must choose an encoding that works with your locale.

>
> 4) I'm not initially expecting arabic, chinese, cyrillic or other language
> types to be stored in the database. But if they were, would UNICODE be the
> best encoding scheme to use for future proofing the data?

Yes.

> 7) Because the database is being used to backend a java web application,
> are there other issues that I need to be aware of, for example, do I have
> to convert all data received to UTF-8 before writing it into the database?
> And do I have to ensure that the response (from the webserver)
> content-type is always set to UTF-8 to be rendered correctly in a user's
> browser?

The jdbc driver will correctly handle conversions between the database
encoding and the encoding the jvm is run under. Receiving data from a web
application is a little different because you must convert data from the
client's encoding to the jvm's encoding for this to work. The simplest
way to do this is just to make sure that you are using unicode in all
three places (server,jvm, and client).

Other things to note:

LOWER()/UPPER() only work correctly in a single byte encoding (not
unicode)

If using binary data (bytea) via JDBC you may need to use a unicode
db. I don't know if this has been fixed, but the server would attempt to
do an encoding conversion on the binary data:

http://archives.postgresql.org/pgsql-jdbc/2004-01/msg00045.php

Kris Jurka

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Louis LeBlanc 2004-01-31 19:54:35 Re: Large object insert/update and oid use
Previous Message Louis LeBlanc 2004-01-31 19:41:15 Re: Large object insert/update and oid use