Re: Space requirements (with respect to foriegn languages)

From: Markus Bertheau <twanger(at)bluetwanger(dot)de>
To: Gerard Samuel <php-db(at)trini0(dot)org>
Cc: pgsql-php(at)postgresql(dot)org
Subject: Re: Space requirements (with respect to foriegn languages)
Date: 2004-08-28 14:39:37
Message-ID: 1093703976.2732.5.camel@teetnang
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-php

В Чтв, 26.08.2004, в 22:36, Gerard Samuel пишет:
> My site/code/database is developed primarily for the english language.
> I've had people from "The Far East" add content to my site using their
> native language, and it is displaying properly in the site.
> But Im a bit concerned about the number of characters these languages use.
> For example, I've had someone enter ->
> chinese testing 中文
>
> It is saved in the database as ->
> chinese testing&#12288;&#20013;&#25991;

Your web page uses a character set that does not contain chinese
characters. So the browser decided to send their respective HTML
entities instead. These entities, as you correctly observed, amount to
more than one (latin, ASCII) character.

> Now, forgive my ignorance, but I have no idea what the additional
> chinese characters mean, but from the values in the database, Im
> assuming that it amounts to 3 characters.
> But if Im correct that those are 3 characters, it is
> using up 24 characters in a column.
>
> My concern is that what if I were to limit a column to say 25 "english"
> characters, and a chinese fellow, comes by and hypothetically says
> "Hello World" in chinese and goes over the limit of the column, the data
> will be truncated.

PostgreSQL will not truncate the data, but reject it; but the general
point is correct.

> Is there anything that can be done to overcome this shortcoming?
>
> Im currently using PostgreSQL 7.4.2, using SQL_ASCII as the database
> characterset, FreeBSD 4.10, php 4.3.6.

Change your site to use a character set that includes chinese
characters, for example Unicode. The most common encoding of Unicode on
the web is UTF-8. It's also the encoding PostgreSQL uses when you use
UNICODE as the database encoding.

If you decide to switch your site to UTF-8 and want varchar(25) to mean
25 characters, and not 25 bytes, you have to change the database
encoding to UNICODE accordingly.

--
Markus Bertheau <twanger(at)bluetwanger(dot)de>

In response to

Responses

Browse pgsql-php by date

  From Date Subject
Next Message Ronald Kuczek 2004-08-28 15:08:50 Re: 8.0 beta1 and XP SP2
Previous Message Oleg Letsinsky 2004-08-27 10:39:33 Re: 8.0 beta1 and XP SP2