Обсуждение: Character set equivalent for AL32UTF8

Поиск
Список
Период
Сортировка

Character set equivalent for AL32UTF8

От
RBharathi
Дата:
Hi,
We plan to migrate data from Oracle 11g with characterset AL32UTF8 to a Postgres db.

What is the euivalent charecterset to use in Postgress. We see only the UTF-8 option.

Please let me know.

RBharathi

Re: Character set equivalent for AL32UTF8

От
Craig Ringer
Дата:
On 2/08/2011 8:52 PM, RBharathi wrote:
> Hi,
> We plan to migrate data from Oracle 11g with characterset AL32UTF8 to a Postgres db.
>
> What is the euivalent charecterset to use in Postgress. We see only the UTF-8 option.

What's AL32UTF8 ? That's not a standard charset name or widely
recognised charset. Is it some Oracle specific feature? If so, what
makes it different to UTF-8 and why do you need it?

Documentation link? References?

A 30-second Google search turned up this:

http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/

"As far as these two character sets go in Oracle,  the only difference
between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores
characters beyond U+FFFF as four bytes (exactly as Unicode defines
UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two
UTF-16 surrogate characters encoded using UTF-8 (or six bytes per
character).  Besides this storage difference, another difference is
better support for supplementary characters in AL32UTF8 character set."


Is this what you're taking about? If so, what's the concern? Have you
checked to see if PostgreSQL's behavior fits your needs?


--
Craig Ringer

Re: FW: Character set equivalent for AL32UTF8

От
Craig Ringer
Дата:
On 10/08/2011 4:07 PM, Mridul Mathew wrote:

> Does PostgreSQL make a distinction within Unicode in a similar fashion?

No.

> We have not tested our Oracle al32utf8 databases on PostgreSQL, but
> while creating databases in PostgreSQL, we see UTF8 as an option, but
> not al32.

al32utf8 is Oracle specific and doesn't seem to be defined anywhere else.

What _application_ _level_ impact does this have for you? What changes
do your apps expect to see in their use of or communication with the
database?

I strongly suggest that you _test_ this in Pg and see.

--
Craig Ringer

Re: FW: Character set equivalent for AL32UTF8

От
Mridul Mathew
Дата:
Hello Craig,

Thanks for the response. You are correct in that the difference between al32utf8 and utf8 is in better support for supplementary characters with al32utf8.

If supplementary characters are inserted in a UTF8 database, they will be treated as 2 separate undefined characters, occupying 6 bytes in storage. Oracle recommends using al32utf8 for any newly defined supplementary characters.

Does PostgreSQL make a distinction within Unicode in a similar fashion? We have not tested our Oracle al32utf8 databases on PostgreSQL, but while creating databases in PostgreSQL, we see UTF8 as an option, but not al32.

Thanks,
Mridul.

On Wed, Aug 10, 2011 at 1:26 PM, Mridul Mathew <mmathew@fiberlink.com> wrote:

 

 

From: Rajeshwar Bharathi [mailto:rajeshwarbharathi@gmail.com]
Sent: Wednesday, August 10, 2011 1:14 PM
To: Mridul Mathew
Subject: Fwd: [ADMIN] Character set equivalent for AL32UTF8

 

 

---------- Forwarded message ----------
From: Craig Ringer <ringerc@ringerc.id.au>
Date: Wed, Aug 10, 2011 at 11:49 AM
Subject: Re: [ADMIN] Character set equivalent for AL32UTF8
To: pgsql.admin@googlegroups.com
Cc: RBharathi <rajeshwarbharathi@gmail.com>, pgsql-admin@postgresql.org


On 2/08/2011 8:52 PM, RBharathi wrote:

Hi,
We plan to migrate data from Oracle 11g with characterset AL32UTF8 to a Postgres db.

What is the euivalent charecterset to use in Postgress. We see only the UTF-8 option.


What's AL32UTF8 ? That's not a standard charset name or widely recognised charset. Is it some Oracle specific feature? If so, what makes it different to UTF-8 and why do you need it?

Documentation link? References?

A 30-second Google search turned up this:

http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/

"As far as these two character sets go in Oracle,  the only difference between AL32UTF8 and UTF8 character sets is that AL32UTF8 stores characters beyond U+FFFF as four bytes (exactly as Unicode defines UTF-8). Oracle’s “UTF8” stores these characters as a sequence of two UTF-16 surrogate characters encoded using UTF-8 (or six bytes per character).  Besides this storage difference, another difference is better support for supplementary characters in AL32UTF8 character set."


Is this what you're taking about? If so, what's the concern? Have you checked to see if PostgreSQL's behavior fits your needs?


--
Craig Ringer




--
Rajeshwar BM
Bangalore INDIA



Fiberlink Disclaimer: The information transmitted is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and delete the material from any computer.

Re: FW: Character set equivalent for AL32UTF8

От
"Kevin Grittner"
Дата:
Mridul Mathew <mridulmathew@gmail.com> wrote:

> From: *Craig Ringer* <ringerc@ringerc.id.au>

>> A 30-second Google search turned up this:
>>
>>
http://decipherinfosys.wordpress.com/2007/01/28/difference-between-utf8-and-al32utf8-character-sets-in-oracle/

> If supplementary characters are inserted in a UTF8 database, they
> will be treated as 2 separate undefined characters, occupying 6
> bytes in storage. Oracle recommends using al32utf8 for any newly
> defined supplementary characters.
>
> Does PostgreSQL make a distinction within Unicode in a similar
> fashion?

It sounds as though Oracle initially failed to properly implement
the UTF-8 character encoding scheme, but rather than fix the broken
scheme they created an alternative.  So far as I know, PostgreSQL
should be using proper UTF-8 encoding if you ask for it, without any
special gyrations.

-Kevin