Locating potential duplicate before switching collation in MySQL

by Martin Westin in


When you need to switch a fields collation you can check if any existing data will conflict with this change. For example a keyword-field might contain data that is considered unique in one collation but not in another. The Swedish character ä is considered a unique character in swedish collation but in general collation it is only an a in an accented form.

Comparing the results from these two queries will show what effect, if any, an change in collation will have.

SELECT id, keyword as key, count(*) as num FROM keywords GROUP BY key having num > 1;
SELECT id, keyword COLLATE utf8_unicode_ci as key, count(*) as num FROM keywords GROUP BY key having num > 1;

Altering character set and collation of tables in MySQL

by Martin Westin in


Altering the character set and collation of a table is sometimes not enough. You may have to alter the actual fields in the table to get MySQL to comply in some cases. I don't know why or when MySQL does this.

ALTER TABLE my_table DEFAULT CHARSET=utf8 COLLATE=utf8_swedish_ci;

ALTER TABLE my_table MODIFY my_field varchar(255) CHARACTER SET utf8 COLLATE utf8_swedish_ci;

Preferred character set and collation in MySQL

by Martin Westin in


My preferred character set is utf8.One big drawback in many cases can be that the default collation ignores accents for characters. The result is that "halla" and "hallå" is interpreted as the same word. This is of-course bad for storing general data such as names users or products that may very well have accented characters that should be respected.

The sollution for me is to not use "utf8_general_ci".

For generic data that have no particular language I prefer "utf8_bin" For data that will be sorted and that can be given a "language" I prefer the collation most appropriate, such as "utf8_swedish_ci"