Locating potential duplicate before switching collation in MySQL

by Martin Westin in


When you need to switch a fields collation you can check if any existing data will conflict with this change. For example a keyword-field might contain data that is considered unique in one collation but not in another. The Swedish character ä is considered a unique character in swedish collation but in general collation it is only an a in an accented form.

Comparing the results from these two queries will show what effect, if any, an change in collation will have.

SELECT id, keyword as key, count(*) as num FROM keywords GROUP BY key having num > 1;
SELECT id, keyword COLLATE utf8_unicode_ci as key, count(*) as num FROM keywords GROUP BY key having num > 1;