Thanks rsp. But I am using the exact same collation (utf8mb3_general_ci). Read on till the end…
I have performed further tests:
- Looking through a MySQL tool (I use Navicat) directly at my tables, the Umlauts show up correctly.
- If I export them through Navicat into csv or txt, they show up with the same control characters.
- This leads me to believe, that the problem may be more on the MySQL side than with SuiteCRM
- From the SQL dump of my db, I noticed that the initial SET NAMES statement is using utf8mb4 (SET NAMES utf8mb4;) whereas all individual tables are using utf8mb3 (ENGINE=InnoDB DEFAULT CHARSET=utf8mb3;). I have no idea where the initial mb4 comes from. According to the MySQL documentation it ensures that all characters are recognised but has no further impact.
My next thought was: what about if I change the encoding of the DB to utf8mb4?
I created a new database on my server (AWS RDS) with Character Set utf8mb4 and Collation utf8mb4_general_ci. I added a table and one record containing umlauts. However, exporting that record resulted in exactly the same control characters!
The weird thing is that if I dump the data, the umlauts are there…
OK, so far, my workflow has been Export into csv file, open up csv file in MS Excel to further analyse, etc. In this post, I found a reference to the problem: Solved: UTF-8 problem in mySQL | Experts Exchange. It turns out, that Excel is simply using ANSI as the character set of csv files and it is Excel which garbles the input.
NOTHING TO DO WITH SUITECRM OR MYSQL. IT’S BLooDY MICROSOFT!
This post describes how to import utf8 encoded files into MS Excel: How to import a .csv file that uses UTF-8 character encoding - ITG Computing Support | Institute for Advanced Study
May it save time for the next poor soul…