Re: Using java.lang.Character for "char" data type

From: Radosław Smogura <mail(at)smogura(dot)eu>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Using java.lang.Character for "char" data type
Date: 2010-05-29 09:12:56
Message-ID: 201005291112.56261.mail@smogura.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Dnia sobota 22 maj 2010 o 03:11:52 Lew napisał(a):

> I'm a little confused. When you say "char" is a byte long, are you
> referring to the SQL type or the Java type? I'm used to seeing the Java
> type expressed in lower case and the SQL type in upper case, so please
> pardon my confusion.
>
> The Java 'char' type is 16 bits wide.
>
> Doesn't the width of the SQL "CHAR" depend on the encoding?
>
> Otherwise how does it handle, say, UTF-8 when you tell the DB to use that?
>
> To put it another way, suppose I enter a String that contains, say, 24
> UTF-8 characters, some of which require multibyte encodings, and try to
> jam it into a "CHAR(24)" column or a "VARCHAR(24)" column. Will that
> cause trouble?
>
> The documentation for CHAR and VARCHAR at
> <http://www.postgresql.org/docs/8.4/interactive/datatype-character.html>
> says
> "SQL defines two primary character types: character varying(n) and
> character(n), where n is a positive integer. Both of these types can store
> strings up to n characters (not bytes) in length."
>
> That seems to contradict what you said.
>
As it was said Java char is 16 bits length and it can represent all Java
available caharacters (internally it's UTF-16 with small modification). When
you say about SQL CHAR it represents one character from all possible SQL
characters - nothing special, it's truism :)

This possible SQL characters, are described by DB encoding if you will uses 8
bit encoding (ASCII, ISO, etc) then you can put about 256 types of characters
in one slot (eg. in DB with ASCII encoding you can't put ISO-8859-2 letters,DB
encoding is used only to describe this possible characters and how those are
stored on disk in binary form (it's same as system encoding when you use text
based file read and write, or what internal encoding file system entry will
have).

Ofcourse CHAR(1) should store any character from set of possible DB's
characters (determined by encoding).

Difference between CHAR and VARCHAR is store strategy VARCHAR string is
variable length, but CHAR(n) string always has length n. If you will put to
CHAR(2) only "A" the select will return "A " (space at the end).

So answer for your question is "You should not care about bit length until
your DB encdoing will support all your characters or until your task is to
optimize or predicate the size of DB files". You should leave bit length of SQL
characters to DB engine. I use UTF-8 database and I see no problem with this.
If you don't belive you can check this.

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Jason Tesser 2010-06-01 13:06:43 Found a Bug in latest Driver (I THINK) and pg 8.4
Previous Message Kris Jurka 2010-05-28 18:52:40 Re: setBlob() copies the blob, even it was already a PostgreSQL blob!