What is Unicode?
Unicode is a universal encoded character set that allows you to store
information from any language using a single character set.
Extended Unicode Enablement
Unicode provides
a unique code value for every character, regardless of the platform, program,
or language. The Unicode standard
has been adopted by many software and hardware vendors, many operating
systems and browsers now support Unicode. Unicode is required
by modern standards such as XML, Java, JavaScript, LDAP, CORBA 3.0,
WML, and it is also compliant to ISO/IEC 10646 standard. Oracle started
supporting Unicode as a database character set in Oracle7. In Oracle9i,
Unicode support has been greatly expanded so that customers can find the
right solution for their globalization needs. Oracle9i supports
Unicode 3.0, the third and most recent version of the Unicode standard.
Unicode Encoding
There are two common ways to encode Unicode 3.0 characters: UTF-16 Encoding UTF-8 Encoding
UTF-8 Encoding
This is the 8-bit
encoding of Unicode. It is a variable-width multibyte encoding in which
the character codes 0x00 through 0x7F have the same meaning as ASCII. One Unicode character
can be 1-byte, 2-bytes, or 3-bytes in this encoding. Generally characters
from the European scripts are represented in either 1 or 2 bytes, while
characters from most Asian scripts are represented in 3 bytes.
UTF-16 Encoding
This is the 16-bit
encoding of Unicode. It is a 2 byte
fixed-width encoding in which
the character codes 0x0000 through 0x007F have the same meaning as ASCII.
OneUnicode character
is 2-bytes in this encoding. Characters from all scripts arerepresented
in 2 bytes.