Revision #21 has been created by fsb on Dec 18, 2011, 7:32:28 PM with the memo:
Removed reference to PHP 6. Forget about it.
« previous (#17) next (#22) »
Changes
Title
unchanged
How to set up Unicode
Category
unchanged
How-tos
Yii version
unchanged
Tags
unchanged
i18n, unicode
Content
changed
[...]
>character-set-server = utf8
># for older versions:
>default-character-set = utf8
>~~~
### Mysql indexes
utf8 is efficient if the data is mostly English (which is often true for web apps) because its variable-length encoding uses one byte for each English alphabet character. For accented Latin and other alphabets it uses multiple bytes per character. But for indexes MySQL uses a fixed-length encoding and requires 3-bytes for every character regardless. So converting an indexed latin1 table to uft8 will tripple the index size and that will slow it down. This also explains why the maximum width of indexed columns is smaller with utf8. In MyISAM an indexed latin1 column can be up to VARCHAR(1000) but utf8 is limited to 333. InnoDB can index latin1 up to VARCHAR(757) and utf8 up to only 255.
## 3. Database connection ##
When connecting to a database a client like PHP has to use a specific charset encoding.[...]
## 5. PHP string functions ##
PHP needs to use UTF-8 internally in order for e.g. string length validation to work correctly. Full Unicode support will be available in PHP 6 and is still [work in progress](http://www.php.net/~scoates/unicode/render_func_data.php).
### mbstring
The alternative is to use [mbstring functions](http://de.php.net/manual/en/ref.mbstring.php) instead of the non-multibyte aware counterparts. Since mbstring is a non-default extension it might not be available on every host. That's one of the reasons why Yii uses the non-multibyte functions like strlen() instead of mb_strlen() by default.[...]