php - utf8 collation difference between unicode and danish -
well hello there. i'm changing charset of database latin1_sweedish_ci
utf8
. i've allways used utf8_danish_ci
because it's closest norwegian character style - think.
utf8_general_ci
, utf8_unicode_ci
?
some time ago; preferred use _general_ci
better/faster performance, , _unicode_ci
more accuracy because of sorting algorithm more complex in latter. since speed/performace no longer issue - or not issue cases anymore - _unicode_ci
ok use in situations?
but how _unicode_ci
differ _danish_ci
?
last 3 letters æ
, ø
, å
in nordic alphabet taken account?
most comparisson (one vs other) can find between _general_ci
, _unicode_ci
.
anyone know of examples when use _unicode_ci
or when use _danish_ci
highly appreciated...
in short, if app multilingual , storing multiple languages within same tables, you're screwed , should worry sorting/collating outside of database -- utf8_general_ci other.
if supports single language, you'll fine setting correct collation @ db level -- in case, utf8_danish_ci indeed, since it's same norwegian if wikipedia go by.
in case you'd read more on collation, icu docs has colorful examples of how thorny kind of stuff gets. quoting extensively:
http://userguide.icu-project.org/collation
[h]ere of ways languages vary in ordering strings:
the letters a-z can sorted in different order in english. example, in lithuanian, "y" sorted between "i" , "k".
combinations of letters can treated if 1 letter. example, in traditional spanish "ch" treated single letter, , sorted between "c" , "d".
accented letters can treated minor variants of unaccented letter. example, "é" can treated equivalent "e".
accented letters can treated distinct letters. example, "Å" in danish treated separate letter sorts after "z".
unaccented letters considered distinct in 1 language can indistinct in another. example, letters "v" , "w" 2 different letters according english. however, "v" , "w" considered variant forms of same letter in swedish.
a letter can treated if 2 letters. example, in traditional german "ä" compared if "ae".
thai requires order of letters reversed.
french requires letters sorted accents @ end of string sorted ahead of accents in beginning of string. example, word "côte" sorts before "coté" because acute accent on final "e" more significant circumflex on "o".
sometimes lowercase letters sort before uppercase letters. reverse required in other situations. example, lowercase letters sorted before uppercase letters in english. latvian letters exact opposite.
even in same language, different applications might require different sorting orders. example, in german dictionaries, "öf" come before "of". in phone books situation exact opposite.
sorting orders can change on time due government regulations or new characters/scripts in unicode.
Comments
Post a Comment