Changing character set in database

January 25, 2010

character set conversion, csalter, csscan, database, utf8

If your database is using anything else than UTF8 as database character set you may consider to migrate from it. Oracle states in the Globalization Guide (10gR2):

At the top of the list of character sets Oracle recommends for all new system deployment is the Unicode character set AL32UTF8.

Depending on what characters you actually have in your database you have three options on how to do this:

  1. Changing the character set with the package CSALTER. Only data dictionary is migrated.

  2. Using CSALTER and convert application data using export/import for all users

  3. Using CSALTER and convert a subset of application data.

Option 1 migrates the data dictionary, that is, the meta data regarding your application data. Nothing else is changed and Oracle, after successfully executing CSALTER, behaves as if your database was created with the new character set. That works OK if the characters used are actually encoded the same way in the new character set as before, if not funny letters will show up when querying the database, meaning you have to go for option 2 (or possibly 3).

Character expansion happens when a symbol changes from one character set to another with more bytes being used for storage in the latter. This is the case for the euro sign (€) in WE8MSWIN, where it requires one byte, in AL32UTF8 it takes three bytes. This requires a conversion of application data using export/import.

Oracle has made an utility for this, csscan. It checks the data dictionary and application data and gives you an analysis telling you what to do. The following procedure was executed on Oracle 10.2.0.4.

First create these two directories are defined in the database:

  • log_file_dir
  • data_file_dir

I don’t think they are really necessary, they don’t seem to be used, but the following script grants access to them for the CSMIG user. Directories are listed in DBA_DIRECTORIES and you may create these anywhere you like with:

create directory log_file_dir as ‘/tmp/csscan’;
create directory data_file_dir as ‘/tmp/csscan’;

Run the following script as SYS in sqlplus to install the tool:

@?/rdbms/admin/csminst.sql

It will ask for a password which you’ll use later (on 9iR2 it uses standard password CSMIG). Then start the scanning with:

csscan csmig/your_password full=y tochar=al32utf8 process=3 array=1024000

The csscan command is documented in chapter 12 in the Globalization Guide. The above analyzes the entire database. Otherwise owner, tables or columns may be specified. The process parameter selects the number of simultaneous scanning processes.

When scanning is completed csscan create a database scan report and individual exception reports as scan.txt and scan.err respectively. The file scan.out contains the output from the command above. These files are created in the directory that was current when the command was executed.