How to set up Unicode

You are viewing revision #13 of this wiki article.
This version may not be up to date with the latest version.
You may want to view the differences to the latest version or see the changes made in this revision.

« previous (#11)next (#17) »

  1. 1. PHP script files
  2. 2. Database tables
  3. 3. Database connection
  4. 4. Webserver/HTTP-Header
  5. 5. PHP string functions

To fix issues with display of special language characters once and for all there's a solution: Use Unicode (UTF-8) everywhere. If everything's set up to use Unicode, you can use mostly every language in your application.

There are several places that all may need some configuration tuning to use Unicode:

1. PHP script files

Make sure that you use an editor which is capable of using UTF-8 and save all your files UTF-8 encoded without BOM. If you have some older non-unicode files in your project open them with your editor and save them again UTF-8 encoded. On Linux you can also use command line tools like recode or iconv to convert a whole bunch of files.

2. Database tables

Every table in your database needs to use UTF-8 charset for its content. The configuration for that might differ between database systems.

MySQL

To find out if a table uses utf8 charset you have to look at the CREATE statement for that table. You can use phpMyAdmin's export feature and look at the CREATE statement.

>Info: Don't confuse the encoding of characters in a table with its collation. The latter is used for sorting in queries and can be changed easily with e.g. phpMyAdmin or even for a single query.

You could also issue this SQL statement:

[sql]
SHOW CREATE TABLE your_tablename;

You'll see a CREATE statement with the CHARSET information at the end. It should like this:

[sql]
CREATE TABLE IF NOT EXISTS `your_tablename` (
  .... your field definitions ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If your table doesn't use UTF-8 charset yet the easiest way to change this is to export your table, adapt the CREATE statement's CHARSET parameter and re-import your table again into the database.

Be very careful when doing this conversion and make sure you save the file with the changed SQL statement in UTF-8 and convert it if neccessary. If not performed carefully you can easily end up with messed up encodings, e.g. having ISO-8859-1 encoded characters in a table with utf8 CHARSET.

>Tip: To have MySQL create all of your tables with utf8 >CHARSET by default, you can add this to your MySQL >configuration (e.g. my.cnf file): > >~~~ >[mysqld] >character-set-server = utf8 ># for older versions: >default-character-set = utf8 >~~~

3. Database connection

When connecting to a database a client like PHP has to use a specific charset encoding. To specify the charset to use for a connection in Yii, configure it like this:

return array(
    ......
    'components'=>array(
        ......
        'db'=>array(
            'connectionString'=>'sqlite:protected/data/source.db',
            'charset'=>'utf8',
        ),
    ),
    ......
);

4. Webserver/HTTP-Header

We also need to let the browser know, that we use UTF-8 with our pages. The best place to do this is in the header of an HTTP response. Configuring this varies between different server software.

>Tip: If you use this approach, there's no need to add additional header information about encoding to your pages. Using the HTTP header is enough.

Apache

You can configure UTF-8 charset either in a VirtualHost section of your server configuration or by adding this line into a .htaccess file in your DocumentRoot:

AddDefaultCharset UTF-8

5. PHP string functions

PHP needs to use UTF-8 internally in order for e.g. string length validation to work correctly. Full Unicode support will be available in PHP 6 and is still work in progress.

mbstring

The alternative is to use mbstring functions instead of the non-multibyte aware counterparts. Since mbstring is a non-default extension it might not be available on every host. That's one of the reasons why Yii uses the non-multibyte functions like strlen() instead of mb_strlen() by default.

Using mbstring with Yii > 1.1.1

Since version 1.1.1 you can use the encoding parameter of CStringValidator. If you set it to utf-8 it will use the mbstring functions for different string validation operations.

Using mbstring with older versions of Yii

A workaround for older releases is to use mbstring's function overloading feature. This will override then non-multibyte aware functions with their mbstring counterpart. To set this up add this in your php.ini:

mbstring.func_overload "7"
mbstring.internal_encoding "UTF-8"

As an alternative you can also enable it for a single VirtualHost in Apache in the according configuration section:

php_admin_value mbstring.func_overload "7"
php_admin_value mbstring.internal_encoding "UTF-8"

>Note: Unfortunately it's not recommended to set this an an .htaccess file as this may lead to undefined behavior.

Links

Chinese version

19 0
14 followers
Viewed: 138 278 times
Version: Unknown (update)
Category: Tutorials
Tags: i18n, unicode
Written by: Mike
Last updated by: Roman Solomatin
Created on: Feb 21, 2009
Last updated: 11 years ago
Update Article

Revisions

View all history

Related Articles