Re: International support

From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: dfunct(at)telus(dot)net
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: International support
Date: 2001-02-23 01:02:24
Message-ID: 20010223100224U.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> I'm currently working a project that is intended to handle Japanese
> character sets - and now I'm told ideally iMode too. :) The iMode isn't
> such an issue at the moment - but the article below has spooked me a
> little. At an early point in the project we tested if putting some input
> into a web form, which ultimately was handled by php then stored in
> postgres would return fully intact - and it did. This left me comfortable
> that PHP and Postgres don't seem to care what language they're storing in
> fields or variables. I'm 'guessing' that this is because the data, whether
> its English or Japanese is being stored in binary (or something
> else?).

No. You are just lucky, I guess. If data submitted by PHP is encoded
in EUC, it's ok, since EUC does not conflict with ASCII. However, it
is encoded in SJIS, you are going into big problem. The second byte of
SJIS *sometimes* conflict with ASCII meta characters such as "\", and
this will make the parser of PostgreSQL crazy.

Of courese the i18n version of PHP will help (it does the conversion
SJIS <--> EUC), but be ware that some characters in SJIS (such as User
define characters especially used in i-mode) are not well supported in
it.

> Of
> course I wouldn't be able to sort the data or do anything else that would
> require PHP/Postgres to be able to interpret the data.

That would depend on how you define "sort". Just doing a normal sort
as you are alredy do it with ASCII, you could get more or less
resonable results, I guess. But if your client requires more "high
level sorts" such as "sorting by YOMIGANA (Japanese pronounciation)"
you need to do something... probably you need to define an extract
field in your table.

> However if I compile
> Postgres with locals support for the character set/language in question -
> then postgres will be able to sort Japanese. Is this right?

No. locale support is useless for Japanese, just slows down
PostgreSQL. Turn it off.

>Have I got this all right so far? I have attempted to do my research on
>this - but finding a real beginners guide to international web development
>has been a trick. And the best sources I have found on this topic generally
>are specific to Oracle. Any links would be appreciated.

Try:
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf

> For the postgres folks, these developers went with MySQL - I've chosen
> Postgres. Is there anything MySQL does that Postgres doesn't in terms of
> language support that I should be aware of?

I believe PostgreSQL's language support is much better than MySQL's
especially for Japanese. PostgreSQL can handle both EUC/SJIS on the
fly (and even Unicode for 7.1!), and has the ability to do an
automatic encoding conversion between them. Moreover, PostgreSQL has
many "multibyte aware" functions including regular expression search,
which MySQL cannot do, I think.

> >PHP's Japanese challenge
> >Since r-newbold.com is in Japanese only, Studio Omame made sure to utilize
> >PHP's Japanese character set conversion functions. However, this proved to
> >be a challenge.
>
> Is this available for v4 of PHP yet?

No.
--
Tatsuo Ishii

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2001-02-23 01:21:11 Re: [INTERFACES] Re: Chinese patch for Pgaccess
Previous Message Tatsuo Ishii 2001-02-23 01:02:11 Re: Chinese patch for Pgaccess