Re: FW: Character set equivalent for AL32UTF8

From: Mridul Mathew <mridulmathew(at)gmail(dot)com>
To: ringerc(at)ringerc(dot)id(dot)au
Cc: rajeshwarbharathi(at)gmail(dot)com, pgsql-admin(at)postgresql(dot)org
Subject: Re: FW: Character set equivalent for AL32UTF8
Date: 2011-08-10 08:07:25
Message-ID: CAFm5QJwk3N5+5g2whZBUee2jHmNssxO98ybBreDLLUDQHh7zuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hello Craig,

Thanks for the response. You are correct in that the difference between
al32utf8 and utf8 is in better support for supplementary characters with
al32utf8.

If supplementary characters are inserted in a UTF8 database, they will be
treated as 2 separate undefined characters, occupying 6 bytes in storage.
Oracle recommends using al32utf8 for any newly defined supplementary
characters.

Does PostgreSQL make a distinction within Unicode in a similar fashion? We
have not tested our Oracle al32utf8 databases on PostgreSQL, but while
creating databases in PostgreSQL, we see UTF8 as an option, but not al32.

Thanks,
Mridul.

On Wed, Aug 10, 2011 at 1:26 PM, Mridul Mathew <mmathew(at)fiberlink(dot)com>wrote:

> ** **
>
> ** **
>
> *From:* Rajeshwar Bharathi [mailto:rajeshwarbharathi(at)gmail(dot)com]
> *Sent:* Wednesday, August 10, 2011 1:14 PM
> *To:* Mridul Mathew
> *Subject:* Fwd: [ADMIN] Character set equivalent for AL32UTF8****
>
> ** **
>
> ** **
>
> ---------- Forwarded message ----------
> From: *Craig Ringer* <ringerc(at)ringerc(dot)id(dot)au>
> Date: Wed, Aug 10, 2011 at 11:49 AM
> Subject: Re: [ADMIN] Character set equivalent for AL32UTF8
> To: pgsql(dot)admin(at)googlegroups(dot)com
> Cc: RBharathi <rajeshwarbharathi(at)gmail(dot)com>, pgsql-admin(at)postgresql(dot)org
>
>
> On 2/08/2011 8:52 PM, RBharathi wrote:****
>
> Hi,
> We plan to migrate data from Oracle 11g with characterset AL32UTF8 to a
> Postgres db.
>
> What is the euivalent charecterset to use in Postgress. We see only the
> UTF-8 option.****
>
>
> What's AL32UTF8 ? That's not a standard charset name or widely recognised
> charset. Is it some Oracle specific feature? If so, what makes it different
> to UTF-8 and why do you need it?
>
> Documentation link? References?
>
> A 30-second Google search turned up this:
>
>
> http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/
>
> "As far as these two character sets go in Oracle, the only difference
> between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters
> beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s
> “UTF8” stores these characters as a sequence of two UTF-16 surrogate
> characters encoded using UTF-8 (or six bytes per character). Besides this
> storage difference, another difference is better support for supplementary
> characters in AL32UTF8 character set."
>
>
> Is this what you're taking about? If so, what's the concern? Have you
> checked to see if PostgreSQL's behavior fits your needs?
>
>
> --
> Craig Ringer****
>
>
>
>
> --
> Rajeshwar BM
> Bangalore INDIA****
>
> ------------------------------
> Fiberlink Disclaimer: The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. Any review, retransmission, dissemination or
> other use of, or taking of any action in reliance upon, this information by
> persons or entities other than the intended recipient is prohibited. If you
> received this in error, please contact the sender and delete the material
> from any computer.
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2011-08-10 08:18:35 Re: postgresql server crash on windows 7 when using plpython
Previous Message Venkat Balaji 2011-08-10 07:43:43 Re: Postgresql 9.0.1 installation error